Comprehensive Datadog alerting for workflow infrastructure failures and API errors. All monitors are critical-only - any occurrence triggers an alert.
Linear Ticket: https://linear.app/composio/issue/INT-1221
| Monitor | Trigger | Source |
|---|---|---|
| Workflow Memory High | Memory > 85% | container.memory metrics |
| Workflow OOM Kill | WORKFLOW_OOM_KILL | ECS Poller |
| Workflow Timeout | WORKFLOW_TIMEOUT | ECS Poller |
| Workflow Infra Initiation Failed | WORKFLOW_INFRA_INITIATION_FAILED | API Service |
| Workflow Infra Execution Failed | WORKFLOW_INFRA_FAILURE | ECS Poller |
| Monitor | Type | Source |
|---|---|---|
| API 5xx (Application Logs) | Log-based | Middleware logs status=5XX |
| API Target 5xx (ALB Metric) | Metric-based | httpcode_target_5xx |
| Monitor | Metric | What it means |
|---|---|---|
| ALB 5xx | httpcode_elb_5xx | Load balancer error |
| ALB 502 | httpcode_elb_502 | Backend crashed/invalid response |
| ALB 504 | httpcode_elb_504 | Backend didn't respond in time |
api/poller/ecs_poller.py)api/routes/workflows/base.py)infra/terraform/modules/datadog-infra-alerts/)🤖 Generated with Claude Code
Re: OOM monitor uses inconsistent event source
False positive. The current code uses a log-based query, not an event query:
type = "log alert"
query = "logs(\"service:${var.service_filter} WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
The WORKFLOW_OOM_KILL log is emitted by the ECS poller when it detects exit code 137. This is not using source:aws source_type:ecs events.
/apply-staging
Fixed in latest commit (fbf1d65). Changed the API 5xx monitor query from attribute search (status:5*) to text search ("status=5") to properly match the middleware log format where status is in the message text.
/apply-staging
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
data.doppler_secrets.github: Reading...
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Read complete after 1s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
data.doppler_secrets.github: Read complete after 1s [id=integrator.prod]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.ecs_api.data.aws_vpc.selected: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.private_subnets.data.aws_availability_zones.available: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Reading...
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.private_subnets.data.aws_vpc.default: Reading...
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.ecs_api.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.private_subnets.data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 1s [id=integrator.prod]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-07b93c1395cedf7c5]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
data.aws_vpc.default: Reading...
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.rds.data.aws_vpc.selected: Reading...
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3A5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api \\\"status=5\\\"\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition =
... (output truncated)
/apply-staging
/apply-staging
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.github_actions_role.data.tls_certificate.github: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.staging]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Refreshing state... [id=integrator-dev-auto-hibernate]
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Refreshing state... [id=integrator-dev-idle-alarm]
module.dev_launch_template.aws_launch_template.dev_instance: Refreshing state... [id=lt-0e0667353277bd721]
module.dev_launch_template.aws_iam_role.dev_instances: Refreshing state... [id=integrator-dev-instances-role]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Refreshing state... [id=integrator-dev-instances-role-20260209154824412400000002]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role]
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Refreshing state... [id=integrator-dev-instances-role-20260209154824396500000001]
module.dev_launch_template.aws_iam_role_policy.parameter_store_access: Refreshing state... [id=integrator-dev-instances-role:parameter-store-access]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Refreshing state... [id=AllowExecutionFromEventBridge]
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Refreshing state... [id=integrator-dev-instances-profile]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Refreshing state... [id=integrator-dev-instances-role:ecr-access]
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Refreshing state... [id=integrator-dev-idle-alarm-HibernateLambda]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
data.aws_vpc.default: Reading...
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_iam_openid_connect_provider.github: Reading...
module.ecs_api.data.aws_vpc.selected: Reading...
data.aws_iam_openid_connect_provider.github: Read complete after 0s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
data.aws_subnets.public: Reading...
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 0s [id=integrator-api]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_route_tables.main: Reading...
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.rds.data.aws_vpc.selected: Reading...
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.rds.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 1s [id=us-east-1]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3A5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api \\\"status=5\\\"\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0] will be destroyed
# (because aws_cloudwatch_event_rule.idle_alarm is not in configuration)
- resource "aws_cloudwatch_event_rule" "idle_alarm" {
- arn = "arn:aws:events:us-east-1:256586139593:rule/integrator-dev-idle-alarm" -> null
- description = "Trigger hibernation lambda when dev instances are idle" -> null
- event_bus_name = "default" -> null
- event_pattern = jsonencode(
{
- detail = {
- alarmName = [
- {
- prefix = "integrator-dev-"
},
]
- state = {
- value = [
- "ALARM",
]
}
}
- detail-type = [
- "CloudWatch Alarm State Change",
]
- source = [
- "aws.cloudwatch",
]
}
) -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm" -> null
- is_enabled = true -> null
- name = "integrator-dev-idle-alarm" -> null
- state = "ENABLED" -> null
- tags = {} -> null
- tags_all = {} -> null
}
# module.dev_launch_template.aws_cloudwatch_event_target.lambda[0] will be destroyed
# (because aws_cloudwatch_event_target.lambda is not in configuration)
- resource "aws_cloudwatch_event_target" "lambda" {
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-dev-auto-hibernate" -> null
- event_bus_name = "default" -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm-HibernateLambda" -> null
- rule = "integrator-dev-idle-alarm" -> null
- target_id = "HibernateLambda" -> null
}
# module.dev_launch_template.aws_iam_instance_profile.dev_instances will be destroyed
# (because aws_iam_instance_profile.dev_instances is not in configuration)
- resource "aws_iam_instance_profile" "dev_instances" {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- id = "integrator-dev-instances-profile" -> null
- name = "integrator-dev-instances-profile" -> null
- path = "/" -> null
- role = "integrator-dev-instances-role" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- unique_id = "AIPATXPNZY7EU2R6O3RLV" -> null
}
# module.dev_launch_template.aws_iam_role.dev_instances will be destroyed
# (because aws_iam_role.dev_instances is not in configuration)
- resource "aws_iam_role" "dev_instances" {
- arn = "arn:aws:iam::256586139593:role/integrator-dev-instances-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "ec2.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- force_detach_policies = false -> null
- id = "integrator-dev-instances-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
- "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-dev-instances-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- unique_id = "AROATXPNZY7ESPFWJ4H5B" -> null
- inline_policy {
- name = "ecr-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = [
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
...(earlier output truncated)...
rt:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs will be destroyed
# (because aws_cloudwatch_log_group.migrations_lambda_logs is not in configuration)
- resource "aws_cloudwatch_log_group" "migrations_lambda_logs" {
- arn = "arn:aws:logs:us-east-1:256586139593:log-group:/aws/lambda/integrator-staging-migrations" -> null
- id = "/aws/lambda/integrator-staging-migrations" -> null
- log_group_class = "STANDARD" -> null
- name = "/aws/lambda/integrator-staging-migrations" -> null
- retention_in_days = 7 -> null
- skip_destroy = false -> null
- tags = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- tags_all = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api"
name = "integrator-staging-api"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:125" -> (known after apply)
# (16 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_service.poller will be updated in-place
~ resource "aws_ecs_service" "poller" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller"
name = "integrator-staging-poller"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:111" -> (known after apply)
# (16 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_task_definition.api must be replaced
-/+ resource "aws_ecs_task_definition" "api" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:125" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-api" -> (known after apply)
~ revision = 125 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_ecs_task_definition.poller must be replaced
-/+ resource "aws_ecs_task_definition" "poller" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:111" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-poller" -> (known after apply)
~ revision = 111 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_iam_policy.launch_background_runner will be updated in-place
~ resource "aws_iam_policy" "launch_background_runner" {
id = "arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy"
name = "integrator-staging-api-launch-tasks-policy"
~ policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "ecs:RunTask",
- "ecs:DescribeTasks",
- "ecs:StopTask",
- "ecs:ListTasks",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:*",
- "arn:aws:ecs:us-east-1:256586139593:task/default/*",
]
},
- {
- Action = [
- "iam:PassRole",
]
- Condition = {
- StringEquals = {
- "iam:PassedToService" = "ecs-tasks.amazonaws.com"
}
}
- Effect = "Allow"
- Resource = [
- "arn:aws:iam::256586139593:role/integrator-staging-ecs-task-execution-role",
- "arn:aws:iam::256586139593:role/integrator-staging-ecs-task-role",
]
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-launch-tasks-policy"
}
# (6 unchanged attributes hidden)
}
# module.ecs_api.aws_iam_role.migrations_lambda will be destroyed
# (because aws_iam_role.migrations_lambda is not in configuration)
- resource "aws_iam_role" "migrations_lambda" {
- arn = "arn:aws:iam::256586139593:role/integrator-staging-migrations-lambda-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "lambda.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-02T10:56:37Z" -> null
- force_detach_policies = false -> null
- id = "integrator-staging-migrations-lambda-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole",
- "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-staging-migrations-lambda-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- tags_all = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- unique_id = "AROATXPNZY7E47WHGBW6K" -> null
- inline_policy {
- name = "integrator-staging-migrations-lambda-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "secretsmanager:GetSecretValue",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0",
]
},
- {
- Action = [
- "s3:GetObject",
- "s3:ListBucket",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:s3:::integrator-staging-migrations",
- "arn:aws:s3:::integrator-staging-migrations/*",
]
},
]
- Version = "2012-10-17"
}
) -> null
}
}
# module.ecs_api.aws_iam_role_policy.migrations_lambda_access will be destroyed
# (because aws_iam_role_policy.migrations_lambda_access is not in configuration)
- resource "aws_iam_role_policy" "migrations_lambda_access" {
- id = "integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access" -> null
- name = "integrator-staging-migrations-lambda-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "secretsmanager:GetSecretValue",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0",
]
},
- {
- Action = [
- "s3:GetObject",
- "s3:ListBucket",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:s3:::integrator-staging-migrations",
- "arn:aws:s3:::integrator-staging-migrations/*",
]
},
]
- Version = "2012-10-17"
}
) -> null
- role = "integrator-staging-migrations-lambda-role" -> null
}
# module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic will be destroyed
# (because aws_iam_role_policy_attachment.migrations_lambda_basic is not in configuration)
- resource "aws_iam_role_policy_attachment" "migrations_lambda_basic" {
- id = "integrator-staging-migrations-lambda-role-20260202105638041100000002" -> null
- policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" -> null
- role = "integrator-staging-migrations-lambda-role" -> null
}
# module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc will be destroyed
# (because aws_iam_role_policy_attachment.migrations_lambda_vpc is not in configuration)
- resource "aws_iam_role_policy_attachment" "migrations_lambda_vpc" {
- id = "integrator-staging-migrations-lambda-role-20260202105638033000000001" -> null
- policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole" -> null
- role = "integrator-staging-migrations-lambda-role" -> null
}
# module.ecs_api.aws_lambda_function.migrations will be destroyed
# (because aws_lambda_function.migrations is not in configuration)
- resource "aws_lambda_function" "migrations" {
- architectures = [
- "x86_64",
] -> null
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-staging-migrations" -> null
- code_sha256 = "f090db9a650b3a53f717f4862d0e8ad13de91ef6f0f46ada11d3dd6d697e52f5" -> null
- function_name = "integrator-staging-migrations" -> null
- id = "integrator-staging-migrations" -> null
- image_uri = "256586139593.dkr.ecr.us-east-1.amazonaws.com/integrator-api:staging-migrations" -> null
- invoke_arn = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:256586139593:function:integrator-staging-migrations/invocations" -> null
- last_modified = "2026-02-02T10:56:54.038+0000" -> null
- layers = [] -> null
- memory_size = 512 -> null
- package_type = "Image" -> null
- publish = false -> null
- qualified_arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-staging-migrations:$LATEST" -> null
- qualified_invoke_arn = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:256586139593:function:integrator-staging-migrations:$LATEST/invocations" -> null
- reserved_concurrent_executions = -1 -> null
- role = "arn:aws:iam::256586139593:role/integrator-staging-migrations-lambda-role" -> null
- skip_destroy = false -> null
- source_code_size = 0 -> null
- tags = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- tags_all = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- timeout = 300 -> null
- version = "$LATEST" -> null
- environment {
- variables = {
- "DB_HOST" = "integrator-staging-postgres.cyx4mqkwmkeq.us-east-1.rds.amazonaws.com"
- "DB_NAME" = "integrator_staging"
- "DB_PASSWORD_SECRET_ARN" = "arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0"
- "DB_PORT" = "5432"
- "DB_SSLMODE" = "require"
- "DB_USER" = (sensitive value)
- "MIGRATIONS_BUCKET" = "integrator-staging-migrations"
- "MIGRATIONS_PREFIX" = "versions/"
} -> null
}
- ephemeral_storage {
- size = 512 -> null
}
- logging_config {
- log_format = "Text" -> null
- log_group = "/aws/lambda/integrator-staging-migrations" -> null
}
- tracing_config {
- mode = "PassThrough" -> null
}
}
# module.ecs_api.aws_s3_bucket.migrations will be destroyed
# (because aws_s3_bucket.migrations is not in configuration)
- resource "aws_s3_bucket" "migrations" {
- arn = "arn:aws:s3:::integrator-staging-migrations" -> null
- bucket = "integrator-staging-migrations" -> null
- bucket_domain_name = "integrator-staging-migrations.s3.amazonaws.com" -> null
- bucket_regional_domain_name = "integrator-staging-migrations.s3.us-east-1.amazonaws.com" -> null
- force_destroy = false -> null
- hosted_zone_id = "Z3AQBSTGFYJSTF" -> null
- id = "integrator-staging-migrations" -> null
- object_lock_enabled = false -> null
- region = "us-east-1" -> null
- request_payer = "BucketOwner" -> null
- tags = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- tags_all = {
- "Environment" = "staging"
- "ManagedBy" = "Terraform"
} -> null
- grant {
- id = "0c593b79d7b8f77aad80e036805eca2eb80a11966b2fd558a8e980b7e6449751" -> null
- permissions = [
- "FULL_CONTROL",
] -> null
- type = "CanonicalUser" -> null
}
- server_side_encryption_configuration {
- rule {
- bucket_key_enabled = false -> null
- apply_server_side_encryption_by_default {
- sse_algorithm = "AES256" -> null
}
}
}
- versioning {
- enabled = false -> null
- mfa_delete = false -> null
}
}
# module.ecs_api.aws_s3_bucket_public_access_block.migrations will be destroyed
# (because aws_s3_bucket_public_access_block.migrations is not in configuration)
- resource "aws_s3_bucket_public_access_block" "migrations" {
- block_public_acls = true -> null
- block_public_policy = true -> null
- bucket = "integrator-staging-migrations" -> null
- id = "integrator-staging-migrations" -> null
- ignore_public_acls = true -> null
- restrict_public_buckets = true -> null
}
# module.ecs_background_runner.aws_ecs_task_definition.background_runner must be replaced
-/+ resource "aws_ecs_task_definition" "background_runner" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:126" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-background-runner" -> (known after apply)
~ revision = 126 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-background-runner"
}
# (10 unchanged attributes hidden)
}
# module.github_actions_role.aws_iam_role_policy.cloudwatch_logs will be updated in-place
~ resource "aws_iam_role_policy" "cloudwatch_logs" {
id = "integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy"
name = "integrator-github-actions-staging-cloudwatch-logs-policy"
~ policy = jsonencode(
~ {
~ Statement = [
~ {
~ Resource = [
# (1 unchanged element hidden)
"arn:aws:logs:us-east-1:256586139593:log-group:/ecs/integrator/staging/migrations:*",
- "arn:aws:logs:us-east-1:256586139593:log-group:/aws/lambda/integrator-staging-migrations",
- "arn:aws:logs:us-east-1:256586139593:log-group:/aws/lambda/integrator-staging-migrations:*",
]
# (2 unchanged attributes hidden)
},
~ {
~ Resource = [
"arn:aws:logs:us-east-1:256586139593:log-group:/ecs/integrator/staging/migrations:*",
- "arn:aws:logs:us-east-1:256586139593:log-group:/aws/lambda/integrator-staging-migrations:*",
]
# (2 unchanged attributes hidden)
},
]
# (1 unchanged attribute hidden)
}
)
# (1 unchanged attribute hidden)
}
# module.github_actions_role.aws_iam_role_policy.lambda will be destroyed
# (because aws_iam_role_policy.lambda is not in configuration)
- resource "aws_iam_role_policy" "lambda" {
- id = "integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy" -> null
- name = "integrator-github-actions-staging-lambda-policy" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "lambda:InvokeFunction",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:lambda:us-east-1:256586139593:function:integrator-staging-migrations",
]
},
- {
- Action = [
- "s3:PutObject",
- "s3:DeleteObject",
- "s3:ListBucket",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:s3:::integrator-staging-migrations",
- "arn:aws:s3:::integrator-staging-migrations/*",
]
},
]
- Version = "2012-10-17"
}
) -> null
- role = "integrator-github-actions-staging" -> null
}
Plan: 12 to add, 5 to change, 14 to destroy.
Changes to Outputs:
~ api_task_definition_arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:125" -> (known after apply)
~ ecs_task_definition_arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:126" -> (known after apply)
module.ecs_api.aws_lambda_function.migrations: Destroying... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Destroying... [id=integrator-staging-migrations-lambda-role-20260202105638041100000002]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Destroying... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Destroying... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.github_actions_role.aws_iam_role_policy.lambda: Destroying... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.ecs_api.aws_ecs_task_definition.poller: Destroying... [id=integrator-staging-poller]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Destroying... [id=integrator-staging-migrations-lambda-role-20260202105638033000000001]
module.ecs_api.aws_ecs_task_definition.api: Destroying... [id=integrator-staging-api]
module.github_actions_role.aws_iam_role_policy.lambda: Destruction complete after 0s
module.ecs_api.aws_ecs_task_definition.api: Destruction complete after 0s
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Destruction complete after 0s
module.ecs_api.aws_ecs_task_definition.poller: Destruction complete after 0s
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Destroying... [id=integrator-staging-background-runner]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Destruction complete after 0s
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Destruction complete after 0s
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Modifying... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Destruction complete after 0s
module.ecs_api.aws_lambda_function.migrations: Destruction complete after 0s
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Modifications complete after 0s [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Destroying... [id=/aws/lambda/integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.api_5xx_logs: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_502[0]: Creating...
module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0]: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_504[0]: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Modifying... [id=254279883]
module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed: Creating...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Destruction complete after 0s
module.ecs_api.aws_iam_role.migrations_lambda: Destroying... [id=integrator-staging-migrations-lambda-role]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Destruction complete after 0s
module.ecs_api.aws_s3_bucket.migrations: Destroying... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.alb_502[0]: Creation complete after 0s [id=256039024]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Creating...
module.datadog_infra_alerts.datadog_monitor.api_5xx_logs: Creation complete after 1s [id=256039025]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Modifications complete after 1s [id=254279883]
module.ecs_api.aws_iam_role.migrations_lambda: Destruction complete after 1s
module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed: Creation complete after 1s [id=256039026]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Destroying... [id=52m-cxz-m3d]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Creation complete after 0s [id=integrator-staging-background-runner]
module.ecs_api.aws_ecs_task_definition.api: Creating...
module.ecs_api.aws_ecs_task_definition.poller: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_504[0]: Creation complete after 1s [id=256039027]
module.datadog_infra_alerts.datadog_monitor.workflow_memory_high: Creating...
module.ecs_api.aws_ecs_task_definition.poller: Creation complete after 0s [id=integrator-staging-poller]
module.ecs_api.aws_ecs_task_definition.api: Creation complete after 0s [id=integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed: Creating...
module.ecs_api.aws_ecs_service.poller: Modifying... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Modifying... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill: Creating...
module.datadog_dashboards.datadog_dashboard.cost[0]: Destroying... [id=6ek-svv-a62]
module.datadog_infra_alerts.datadog_monitor.workflow_timeout: Creating...
module.datadog_infra_alerts.datadog_monitor.workflow_memory_high: Creation complete after 0s [id=256039028]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Destruction complete after 0s
module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed: Creation complete after 0s [id=256039033]
module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill: Creation complete after 0s [id=256039034]
module.datadog_infra_alerts.datadog_monitor.workflow_timeout: Creation complete after 1s [id=256039035]
module.datadog_dashboards.datadog_dashboard.cost[0]: Destruction complete after 1s
module.ecs_api.aws_ecs_service.poller: Modifications complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Modifications complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0]: Creation complete after 2s [id=256039036]
Error: deleting S3 Bucket (integrator-staging-migrations): operation error S3: DeleteBucket, https response error StatusCode: 409, RequestID: ZP85QHX59HH91Z33, HostID: bUsyyega9YSUI0PTsfxpGLK83KAOHCrNrfdT8AHUE4YGBd4eP6R03x3gN2hY5BibiRjIFtYuCa8=, api error BucketNotEmpty: The bucket you tried to delete is not empty
⚠️ Apply failed. Please check the output above.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
data.doppler_secrets.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.github_actions_role.data.tls_certificate.github: Read complete after 1s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
data.doppler_secrets.github: Read complete after 1s [id=integrator.prod]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.private_subnets.data.aws_vpc.default: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
data.aws_caller_identity.current: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.private_subnets.data.aws_availability_zones.available: Reading...
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 0s [id=ami-07b93c1395cedf7c5]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.rds.data.aws_region.current: Reading...
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_vpc.selected: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.prod]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.private_subnets.data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.data.aws_subnets.public: Reading...
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
data.aws_vpc.default: Reading...
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
module.rds.data.aws_vpc.selected: Reading...
data.aws_subnets.public: Reading...
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 1s [id=256586139593]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3A5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition = "arn:a
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.staging]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Refreshing state... [id=integrator-dev-idle-alarm-HibernateLambda]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Refreshing state... [id=integrator-dev-instances-profile]
data.aws_vpc.default: Reading...
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Refreshing state... [id=integrator-dev-idle-alarm]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
module.dev_launch_template.aws_launch_template.dev_instance: Refreshing state... [id=lt-0e0667353277bd721]
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Refreshing state... [id=integrator-dev-instances-role-20260209154824396500000001]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Refreshing state... [id=integrator-dev-instances-role:ecr-access]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_iam_role_policy.parameter_store_access: Refreshing state... [id=integrator-dev-instances-role:parameter-store-access]
module.dev_launch_template.aws_iam_role.dev_instances: Refreshing state... [id=integrator-dev-instances-role]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Refreshing state... [id=integrator-dev-instances-role-20260209154824412400000002]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Refreshing state... [id=AllowExecutionFromEventBridge]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Reading...
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
data.aws_iam_openid_connect_provider.github: Reading...
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_vpc.selected: Reading...
data.aws_iam_openid_connect_provider.github: Read complete after 1s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Refreshing state... [id=integrator-dev-auto-hibernate]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
data.aws_subnets.public: Reading...
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.rds.data.aws_vpc.selected: Reading...
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.rds.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 1s [id=us-east-1]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 2s [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3A5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0] will be destroyed
# (because aws_cloudwatch_event_rule.idle_alarm is not in configuration)
- resource "aws_cloudwatch_event_rule" "idle_alarm" {
- arn = "arn:aws:events:us-east-1:256586139593:rule/integrator-dev-idle-alarm" -> null
- description = "Trigger hibernation lambda when dev instances are idle" -> null
- event_bus_name = "default" -> null
- event_pattern = jsonencode(
{
- detail = {
- alarmName = [
- {
- prefix = "integrator-dev-"
},
]
- state = {
- value = [
- "ALARM",
]
}
}
- detail-type = [
- "CloudWatch Alarm State Change",
]
- source = [
- "aws.cloudwatch",
]
}
) -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm" -> null
- is_enabled = true -> null
- name = "integrator-dev-idle-alarm" -> null
- state = "ENABLED" -> null
- tags = {} -> null
- tags_all = {} -> null
}
# module.dev_launch_template.aws_cloudwatch_event_target.lambda[0] will be destroyed
# (because aws_cloudwatch_event_target.lambda is not in configuration)
- resource "aws_cloudwatch_event_target" "lambda" {
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-dev-auto-hibernate" -> null
- event_bus_name = "default" -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm-HibernateLambda" -> null
- rule = "integrator-dev-idle-alarm" -> null
- target_id = "HibernateLambda" -> null
}
# module.dev_launch_template.aws_iam_instance_profile.dev_instances will be destroyed
# (because aws_iam_instance_profile.dev_instances is not in configuration)
- resource "aws_iam_instance_profile" "dev_instances" {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- id = "integrator-dev-instances-profile" -> null
- name = "integrator-dev-instances-profile" -> null
- path = "/" -> null
- role = "integrator-dev-instances-role" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- unique_id = "AIPATXPNZY7EU2R6O3RLV" -> null
}
# module.dev_launch_template.aws_iam_role.dev_instances will be destroyed
# (because aws_iam_role.dev_instances is not in configuration)
- resource "aws_iam_role" "dev_instances" {
- arn = "arn:aws:iam::256586139593:role/integrator-dev-instances-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "ec2.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- force_detach_policies = false -> null
- id = "integrator-dev-instances-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
- "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-dev-instances-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- unique_id = "AROATXPNZY7ESPFWJ4H5B" -> null
- inline_policy {
- name = "ecr-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = [
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Refreshing state... [id=integrator-dev-auto-hibernate]
module.ecs_api.data.aws_vpc.selected: Reading...
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Refreshing state... [id=integrator-dev-instances-role-20260209154824396500000001]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Refreshing state... [id=integrator-dev-instances-role-20260209154824412400000002]
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Refreshing state... [id=integrator-dev-idle-alarm]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Refreshing state... [id=AllowExecutionFromEventBridge]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.dev_launch_template.aws_iam_role_policy.parameter_store_access: Refreshing state... [id=integrator-dev-instances-role:parameter-store-access]
module.dev_launch_template.aws_iam_role.dev_instances: Refreshing state... [id=integrator-dev-instances-role]
module.dev_launch_template.aws_launch_template.dev_instance: Refreshing state... [id=lt-0e0667353277bd721]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Refreshing state... [id=integrator-dev-instances-role:ecr-access]
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Refreshing state... [id=integrator-dev-instances-profile]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Refreshing state... [id=integrator-dev-idle-alarm-HibernateLambda]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
data.aws_iam_openid_connect_provider.github: Reading...
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
data.aws_iam_openid_connect_provider.github: Read complete after 0s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_vpc.default: Reading...
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.staging]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
data.aws_subnets.public: Reading...
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.rds.data.aws_vpc.selected: Reading...
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0] will be destroyed
# (because aws_cloudwatch_event_rule.idle_alarm is not in configuration)
- resource "aws_cloudwatch_event_rule" "idle_alarm" {
- arn = "arn:aws:events:us-east-1:256586139593:rule/integrator-dev-idle-alarm" -> null
- description = "Trigger hibernation lambda when dev instances are idle" -> null
- event_bus_name = "default" -> null
- event_pattern = jsonencode(
{
- detail = {
- alarmName = [
- {
- prefix = "integrator-dev-"
},
]
- state = {
- value = [
- "ALARM",
]
}
}
- detail-type = [
- "CloudWatch Alarm State Change",
]
- source = [
- "aws.cloudwatch",
]
}
) -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm" -> null
- is_enabled = true -> null
- name = "integrator-dev-idle-alarm" -> null
- state = "ENABLED" -> null
- tags = {} -> null
- tags_all = {} -> null
}
# module.dev_launch_template.aws_cloudwatch_event_target.lambda[0] will be destroyed
# (because aws_cloudwatch_event_target.lambda is not in configuration)
- resource "aws_cloudwatch_event_target" "lambda" {
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-dev-auto-hibernate" -> null
- event_bus_name = "default" -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm-HibernateLambda" -> null
- rule = "integrator-dev-idle-alarm" -> null
- target_id = "HibernateLambda" -> null
}
# module.dev_launch_template.aws_iam_instance_profile.dev_instances will be destroyed
# (because aws_iam_instance_profile.dev_instances is not in configuration)
- resource "aws_iam_instance_profile" "dev_instances" {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- id = "integrator-dev-instances-profile" -> null
- name = "integrator-dev-instances-profile" -> null
- path = "/" -> null
- role = "integrator-dev-instances-role" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- unique_id = "AIPATXPNZY7EU2R6O3RLV" -> null
}
# module.dev_launch_template.aws_iam_role.dev_instances will be destroyed
# (because aws_iam_role.dev_instances is not in configuration)
- resource "aws_iam_role" "dev_instances" {
- arn = "arn:aws:iam::256586139593:role/integrator-dev-instances-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "ec2.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- force_detach_policies = false -> null
- id = "integrator-dev-instances-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
- "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-dev-instances-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- unique_id = "AROATXPNZY7ESPFWJ4H5B" -> null
- inline_policy {
- name = "ecr-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = [
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
data.doppler_secrets.github: Reading...
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
data.doppler_secrets.github: Read complete after 1s [id=integrator.prod]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 1s [id=integration-aws-iam-permissions]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-02f058bc6b3ee6fcc]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_vpc.selected: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.private_subnets.data.aws_vpc.default: Reading...
module.private_subnets.data.aws_availability_zones.available: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
data.aws_caller_identity.current: Reading...
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 1s [id=integrator.prod]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.private_subnets.data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.data.aws_subnets.public: Reading...
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
data.aws_vpc.default: Reading...
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.rds.data.aws_vpc.selected: Reading...
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
data.aws_subnets.public: Reading...
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition = "arn:a
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.github_actions_role.data.tls_certificate.github: Reading...
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.dev_launch_template.aws_iam_role_policy.ec2_tag_access: Refreshing state... [id=integrator-dev-instances-role:ec2-tag-access]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Refreshing state... [id=integrator-dev-instances-role-20260209154824412400000002]
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Refreshing state... [id=integrator-dev-idle-alarm]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Refreshing state... [id=integrator-dev-instances-role:ecr-access]
module.dev_launch_template.aws_iam_role.dev_instances: Refreshing state... [id=integrator-dev-instances-role]
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Refreshing state... [id=integrator-dev-instances-role-20260209154824396500000001]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Refreshing state... [id=AllowExecutionFromEventBridge]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role]
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Refreshing state... [id=integrator-dev-idle-alarm-HibernateLambda]
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Refreshing state... [id=integrator-dev-auto-hibernate]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 1s [id=integrator.staging]
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Refreshing state... [id=integrator-dev-instances-profile]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.dev_launch_template.aws_launch_template.dev_instance: Refreshing state... [id=lt-0e0667353277bd721]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.bedrock_profiles.data.aws_region.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
data.aws_iam_openid_connect_provider.github: Reading...
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_caller_identity.current: Reading...
data.aws_vpc.default: Reading...
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
data.aws_iam_openid_connect_provider.github: Read complete after 0s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_api.data.aws_vpc.selected: Reading...
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.data.aws_route_tables.main: Reading...
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.rds.data.aws_vpc.selected: Reading...
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 1s [id=us-east-1]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 0s [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0] will be destroyed
# (because aws_cloudwatch_event_rule.idle_alarm is not in configuration)
- resource "aws_cloudwatch_event_rule" "idle_alarm" {
- arn = "arn:aws:events:us-east-1:256586139593:rule/integrator-dev-idle-alarm" -> null
- description = "Trigger hibernation lambda when dev instances are idle" -> null
- event_bus_name = "default" -> null
- event_pattern = jsonencode(
{
- detail = {
- alarmName = [
- {
- prefix = "integrator-dev-"
},
]
- state = {
- value = [
- "ALARM",
]
}
}
- detail-type = [
- "CloudWatch Alarm State Change",
]
- source = [
- "aws.cloudwatch",
]
}
) -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm" -> null
- is_enabled = true -> null
- name = "integrator-dev-idle-alarm" -> null
- state = "ENABLED" -> null
- tags = {} -> null
- tags_all = {} -> null
}
# module.dev_launch_template.aws_cloudwatch_event_target.lambda[0] will be destroyed
# (because aws_cloudwatch_event_target.lambda is not in configuration)
- resource "aws_cloudwatch_event_target" "lambda" {
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-dev-auto-hibernate" -> null
- event_bus_name = "default" -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm-HibernateLambda" -> null
- rule = "integrator-dev-idle-alarm" -> null
- target_id = "HibernateLambda" -> null
}
# module.dev_launch_template.aws_iam_instance_profile.dev_instances will be destroyed
# (because aws_iam_instance_profile.dev_instances is not in configuration)
- resource "aws_iam_instance_profile" "dev_instances" {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- id = "integrator-dev-instances-profile" -> null
- name = "integrator-dev-instances-profile" -> null
- path = "/" -> null
- role = "integrator-dev-instances-role" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- unique_id = "AIPATXPNZY7EU2R6O3RLV" -> null
}
# module.dev_launch_template.aws_iam_role.dev_instances will be destroyed
# (because aws_iam_role.dev_instances is not in configuration)
- resource "aws_iam_role" "dev_instances" {
- arn = "arn:aws:iam::256586139593:role/integrator-dev-instances-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "ec2.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- force_detach_policies = false -> null
- id = "integrator-dev-instances-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
- "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-dev-instances-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- unique_id = "AROATXPNZY7ESPFWJ4H5B" -> null
- inline_policy {
- name = "ec2-tag-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = "ec2:DescribeTags"
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
data.doppler_secrets.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 1s [id=integrator.prod]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
data.doppler_secrets.github: Read complete after 1s [id=integrator.prod]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.private_subnets.data.aws_availability_zones.available: Reading...
module.private_subnets.data.aws_vpc.default: Reading...
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.ecs_api.data.aws_vpc.selected: Reading...
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.private_subnets.data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.ecs_api.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-02f058bc6b3ee6fcc]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
data.aws_vpc.default: Reading...
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.rds.data.aws_vpc.selected: Reading...
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
data.aws_subnets.public: Reading...
module.rds.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition = "arn:a
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
data.doppler_secrets.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
data.doppler_secrets.github: Read complete after 0s [id=integrator.prod]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 1s [id=integration-aws-available-namespaces]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 1s [id=integration-aws-iam-permissions]
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.prod]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.private_subnets.data.aws_availability_zones.available: Reading...
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.private_subnets.data.aws_availability_zones.available: Read complete after 1s [id=us-east-1]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.private_subnets.data.aws_vpc.default: Reading...
data.aws_caller_identity.current: Reading...
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.ecs_api.data.aws_vpc.selected: Reading...
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.private_subnets.data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-02f058bc6b3ee6fcc]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
data.aws_vpc.default: Reading...
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 1s [id=3755515486]
module.rds.data.aws_vpc.selected: Reading...
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
data.aws_subnets.public: Reading...
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 1s [id=us-east-1]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition = "arn:a
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Refreshing state... [id=AllowExecutionFromEventBridge]
module.dev_launch_template.aws_iam_role_policy.ec2_tag_access: Refreshing state... [id=integrator-dev-instances-role:ec2-tag-access]
module.dev_launch_template.aws_iam_role.dev_instances: Refreshing state... [id=integrator-dev-instances-role]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Refreshing state... [id=integrator-dev-instances-role-20260209154824412400000002]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Refreshing state... [id=integrator-dev-idle-alarm-HibernateLambda]
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Refreshing state... [id=integrator-dev-instances-role-20260209154824396500000001]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 1s [id=integrator.staging]
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Refreshing state... [id=integrator-dev-instances-profile]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Refreshing state... [id=integrator-dev-instances-role:ecr-access]
module.dev_launch_template.aws_launch_template.dev_instance: Refreshing state... [id=lt-0e0667353277bd721]
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Refreshing state... [id=integrator-dev-auto-hibernate]
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Refreshing state... [id=integrator-dev-idle-alarm]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.data.aws_vpc.selected: Reading...
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
data.aws_iam_openid_connect_provider.github: Reading...
data.aws_vpc.default: Reading...
data.aws_iam_openid_connect_provider.github: Read complete after 1s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
data.aws_subnets.public: Reading...
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
data.aws_subnets.public: Read complete after 1s [id=us-east-1]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.rds.data.aws_vpc.selected: Reading...
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0] will be destroyed
# (because aws_cloudwatch_event_rule.idle_alarm is not in configuration)
- resource "aws_cloudwatch_event_rule" "idle_alarm" {
- arn = "arn:aws:events:us-east-1:256586139593:rule/integrator-dev-idle-alarm" -> null
- description = "Trigger hibernation lambda when dev instances are idle" -> null
- event_bus_name = "default" -> null
- event_pattern = jsonencode(
{
- detail = {
- alarmName = [
- {
- prefix = "integrator-dev-"
},
]
- state = {
- value = [
- "ALARM",
]
}
}
- detail-type = [
- "CloudWatch Alarm State Change",
]
- source = [
- "aws.cloudwatch",
]
}
) -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm" -> null
- is_enabled = true -> null
- name = "integrator-dev-idle-alarm" -> null
- state = "ENABLED" -> null
- tags = {} -> null
- tags_all = {} -> null
}
# module.dev_launch_template.aws_cloudwatch_event_target.lambda[0] will be destroyed
# (because aws_cloudwatch_event_target.lambda is not in configuration)
- resource "aws_cloudwatch_event_target" "lambda" {
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-dev-auto-hibernate" -> null
- event_bus_name = "default" -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm-HibernateLambda" -> null
- rule = "integrator-dev-idle-alarm" -> null
- target_id = "HibernateLambda" -> null
}
# module.dev_launch_template.aws_iam_instance_profile.dev_instances will be destroyed
# (because aws_iam_instance_profile.dev_instances is not in configuration)
- resource "aws_iam_instance_profile" "dev_instances" {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- id = "integrator-dev-instances-profile" -> null
- name = "integrator-dev-instances-profile" -> null
- path = "/" -> null
- role = "integrator-dev-instances-role" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- unique_id = "AIPATXPNZY7EU2R6O3RLV" -> null
}
# module.dev_launch_template.aws_iam_role.dev_instances will be destroyed
# (because aws_iam_role.dev_instances is not in configuration)
- resource "aws_iam_role" "dev_instances" {
- arn = "arn:aws:iam::256586139593:role/integrator-dev-instances-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "ec2.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-09T15:48:24Z" -> null
- force_detach_policies = false -> null
- id = "integrator-dev-instances-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
- "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-dev-instances-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- unique_id = "AROATXPNZY7ESPFWJ4H5B" -> null
- inline_policy {
- name = "ec2-tag-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = "ec2:DescribeTags"
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.github_actions_role.data.tls_certificate.github: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Refreshing state... [id=integrator-dev-instances-role-20260212215001244800000002]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.dev_launch_template.aws_iam_role.dev_instances: Refreshing state... [id=integrator-dev-instances-role]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Refreshing state... [id=integrator-dev-idle-alarm-HibernateLambda]
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Refreshing state... [id=integrator-dev-auto-hibernate]
module.dev_launch_template.aws_launch_template.dev_instance: Refreshing state... [id=lt-06ad0f45df055e165]
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Refreshing state... [id=integrator-dev-idle-alarm]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Refreshing state... [id=AllowExecutionFromEventBridge]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Refreshing state... [id=integrator-dev-instances-role-20260212215001173200000001]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Refreshing state... [id=integrator-dev-instances-role:ecr-access]
module.dev_launch_template.aws_iam_role_policy.ec2_tag_access: Refreshing state... [id=integrator-dev-instances-role:ec2-tag-access]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role]
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Refreshing state... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Refreshing state... [id=integrator-dev-instances-profile]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.ecs_api.data.aws_vpc.selected: Reading...
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.bedrock_profiles.data.aws_region.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
data.aws_caller_identity.current: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_vpc.default: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
data.aws_iam_openid_connect_provider.github: Reading...
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
data.aws_iam_openid_connect_provider.github: Read complete after 0s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.staging]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
data.aws_subnets.public: Reading...
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.rds.data.aws_vpc.selected: Reading...
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.rds.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0] will be destroyed
# (because aws_cloudwatch_event_rule.idle_alarm is not in configuration)
- resource "aws_cloudwatch_event_rule" "idle_alarm" {
- arn = "arn:aws:events:us-east-1:256586139593:rule/integrator-dev-idle-alarm" -> null
- description = "Trigger hibernation lambda when dev instances are idle" -> null
- event_bus_name = "default" -> null
- event_pattern = jsonencode(
{
- detail = {
- alarmName = [
- {
- prefix = "integrator-dev-"
},
]
- state = {
- value = [
- "ALARM",
]
}
}
- detail-type = [
- "CloudWatch Alarm State Change",
]
- source = [
- "aws.cloudwatch",
]
}
) -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm" -> null
- is_enabled = true -> null
- name = "integrator-dev-idle-alarm" -> null
- state = "ENABLED" -> null
- tags = {} -> null
- tags_all = {} -> null
}
# module.dev_launch_template.aws_cloudwatch_event_target.lambda[0] will be destroyed
# (because aws_cloudwatch_event_target.lambda is not in configuration)
- resource "aws_cloudwatch_event_target" "lambda" {
- arn = "arn:aws:lambda:us-east-1:256586139593:function:integrator-dev-auto-hibernate" -> null
- event_bus_name = "default" -> null
- force_destroy = false -> null
- id = "integrator-dev-idle-alarm-HibernateLambda" -> null
- rule = "integrator-dev-idle-alarm" -> null
- target_id = "HibernateLambda" -> null
}
# module.dev_launch_template.aws_iam_instance_profile.dev_instances will be destroyed
# (because aws_iam_instance_profile.dev_instances is not in configuration)
- resource "aws_iam_instance_profile" "dev_instances" {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
- create_date = "2026-02-12T21:50:01Z" -> null
- id = "integrator-dev-instances-profile" -> null
- name = "integrator-dev-instances-profile" -> null
- path = "/" -> null
- role = "integrator-dev-instances-role" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-profile"
} -> null
- unique_id = "AIPATXPNZY7EW4IPXUNNG" -> null
}
# module.dev_launch_template.aws_iam_role.dev_instances will be destroyed
# (because aws_iam_role.dev_instances is not in configuration)
- resource "aws_iam_role" "dev_instances" {
- arn = "arn:aws:iam::256586139593:role/integrator-dev-instances-role" -> null
- assume_role_policy = jsonencode(
{
- Statement = [
- {
- Action = "sts:AssumeRole"
- Effect = "Allow"
- Principal = {
- Service = "ec2.amazonaws.com"
}
},
]
- Version = "2012-10-17"
}
) -> null
- create_date = "2026-02-12T21:50:00Z" -> null
- force_detach_policies = false -> null
- id = "integrator-dev-instances-role" -> null
- managed_policy_arns = [
- "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
- "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
] -> null
- max_session_duration = 3600 -> null
- name = "integrator-dev-instances-role" -> null
- path = "/" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-role"
} -> null
- unique_id = "AROATXPNZY7E3XXDFQOY7" -> null
- inline_policy {
- name = "ec2-tag-access" -> null
- policy = jsonencode(
{
- Statement = [
- {
- Action = "ec2:DescribeTags"
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.doppler_aws_secrets.data.external.env_vars: Reading...
data.doppler_secrets.github: Reading...
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
data.doppler_secrets.github: Read complete after 1s [id=integrator.prod]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.prod]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.github_actions_role.data.tls_certificate.github: Read complete after 1s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.private_subnets.data.aws_vpc.default: Reading...
module.private_subnets.data.aws_availability_zones.available: Reading...
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.ecs_api.data.aws_vpc.selected: Reading...
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.private_subnets.data.aws_availability_zones.available: Read complete after 1s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.private_subnets.data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 0s [id=ami-02f058bc6b3ee6fcc]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_route_tables.main: Reading...
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.data.aws_route_tables.main: Read complete after 1s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
data.aws_vpc.default: Reading...
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
module.rds.data.aws_vpc.selected: Reading...
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
data.aws_subnets.public: Reading...
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.rds.aws_security_group_rule.bastion_proxy[0]: Refreshing state... [id=sgrule-417166805]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
<= read (data resources)
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
...(earlier output truncated)...
GN1dCAtZC8gLWYxKScKRXhlY1N0YXJ0UHJlPS9iaW4vYmFzaCAtYyAnZG9ja2VyIHB1bGwgXCRET0NLRVJfSU1BR0UnCkV4ZWNTdGFydFByZT0vYmluL2Jhc2ggLWMgJ2RvY2tlciBybSAtZiBcJENPTlRBSU5FUl9OQU1FIDI+L2Rldi9udWxsIHx8IHRydWUnCgpFeGVjU3RhcnQ9L2Jpbi9iYXNoIC1jICdkb2NrZXIgcnVuIC1kIC0tbmFtZSBcJENPTlRBSU5FUl9OQU1FIC0tcmVzdGFydCB1bmxlc3Mtc3RvcHBlZCAtdiBcJFdPUktTUEFDRV9QQVRIOi93b3Jrc3BhY2UgLXYgL2hvbWUvdWJ1bnR1Ly5jbGF1ZGU6L3Jvb3QvLmNsYXVkZSAtdiAvdmFyL3J1bi9kb2NrZXIuc29jazovdmFyL3J1bi9kb2NrZXIuc29jayAtZSBBV1NfUkVHSU9OPVwkUkVHSU9OIC0td29ya2RpciAvd29ya3NwYWNlIFwkRE9DS0VSX0lNQUdFIHRhaWwgLWYgL2Rldi9udWxsJwoKRXhlY1N0b3A9L2Jpbi9iYXNoIC1jICdkb2NrZXIgc3RvcCBcJENPTlRBSU5FUl9OQU1FIHx8IHRydWUnCkV4ZWNTdG9wUG9zdD0vYmluL2Jhc2ggLWMgJ2RvY2tlciBybSBcJENPTlRBSU5FUl9OQU1FIHx8IHRydWUnCgpSZXN0YXJ0PW9uLWZhaWx1cmUKUmVzdGFydFNlYz0xMAoKW0luc3RhbGxdCldhbnRlZEJ5PW11bHRpLXVzZXIudGFyZ2V0ClNZU1RFTURfRU9GCgojIEVuc3VyZSB1YnVudHUgdXNlciBjYW4gcnVuIGRvY2tlciB3aXRob3V0IHN1ZG8KdXNlcm1vZCAtYUcgZG9ja2VyIHVidW50dQoKIyBBdXRvLWV4ZWMgaW50byBjb250YWluZXIgb24gU1NIIGxvZ2luCmNhdCA+PiAvaG9tZS91YnVudHUvLmJhc2hyYyA8PCdCQVNIUkNfRU9GJwoKIyBBdXRvLWVudGVyIGRldiBjb250YWluZXIgb24gU1NIIGxvZ2luCmlmIFsgLW4gIiRTU0hfQ09OTkVDVElPTiIgXSAmJiBbIC1uICIkKGRvY2tlciBwcyAtcSAtZiBuYW1lPWNsYXVkZS1kZXYgMj4vZGV2L251bGwpIiBdOyB0aGVuCiAgICBleGVjIGRvY2tlciBleGVjIC1pdCBjbGF1ZGUtZGV2IGJhc2gKZmkKQkFTSFJDX0VPRgoKIyBTdGFydCBzZXJ2aWNlCnN5c3RlbWN0bCBkYWVtb24tcmVsb2FkCnN5c3RlbWN0bCBlbmFibGUgY2xhdWRlLWRldi1jb250YWluZXIuc2VydmljZQpzeXN0ZW1jdGwgc3RhcnQgY2xhdWRlLWRldi1jb250YWluZXIuc2VydmljZQoKZWNobyAiPT09IFNldHVwIGNvbXBsZXRlID09PSIK" -> null
- vpc_security_group_ids = [
- "sg-0bf64ade7c98a1da6",
] -> null
- block_device_mappings {
- device_name = "/dev/sda1" -> null
- ebs {
- delete_on_termination = "true" -> null
- encrypted = "true" -> null
- iops = 0 -> null
- throughput = 0 -> null
- volume_initialization_rate = 0 -> null
- volume_size = 100 -> null
- volume_type = "gp3" -> null
}
}
- hibernation_options {
- configured = true -> null
}
- iam_instance_profile {
- arn = "arn:aws:iam::256586139593:instance-profile/integrator-dev-instances-profile" -> null
}
- metadata_options {
- http_endpoint = "enabled" -> null
- http_put_response_hop_limit = 1 -> null
- http_tokens = "required" -> null
}
- monitoring {
- enabled = true -> null
}
- tag_specifications {
- resource_type = "instance" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "launch-template"
- "Project" = "integrator"
} -> null
}
- tag_specifications {
- resource_type = "volume" -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "launch-template"
- "Project" = "integrator"
} -> null
}
}
# module.dev_launch_template.aws_security_group.dev_instances will be destroyed
# (because aws_security_group.dev_instances is not in configuration)
- resource "aws_security_group" "dev_instances" {
- arn = "arn:aws:ec2:us-east-1:256586139593:security-group/sg-0bf64ade7c98a1da6" -> null
- description = "Security group for personal dev EC2 instances" -> null
- egress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "Allow all outbound traffic"
- from_port = 0
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "-1"
- security_groups = []
- self = false
- to_port = 0
},
] -> null
- id = "sg-0bf64ade7c98a1da6" -> null
- ingress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "SSH from anywhere (restricted by individual instance tags)"
- from_port = 22
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "tcp"
- security_groups = []
- self = false
- to_port = 22
},
] -> null
- name = "integrator-dev-instances" -> null
- owner_id = "256586139593" -> null
- revoke_rules_on_delete = false -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- vpc_id = "vpc-0529dea5160deb846" -> null
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api"
name = "integrator-staging-api"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:144" -> (known after apply)
# (16 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_service.poller will be updated in-place
~ resource "aws_ecs_service" "poller" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller"
name = "integrator-staging-poller"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:130" -> (known after apply)
# (16 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_task_definition.api must be replaced
-/+ resource "aws_ecs_task_definition" "api" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:144" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-api" -> (known after apply)
~ revision = 144 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_ecs_task_definition.poller must be replaced
-/+ resource "aws_ecs_task_definition" "poller" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:130" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-poller" -> (known after apply)
~ revision = 130 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_iam_policy.launch_background_runner will be updated in-place
~ resource "aws_iam_policy" "launch_background_runner" {
id = "arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy"
name = "integrator-staging-api-launch-tasks-policy"
~ policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "ecs:RunTask",
- "ecs:DescribeTasks",
- "ecs:StopTask",
- "ecs:ListTasks",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:*",
- "arn:aws:ecs:us-east-1:256586139593:task/default/*",
]
},
- {
- Action = [
- "iam:PassRole",
]
- Condition = {
- StringEquals = {
- "iam:PassedToService" = "ecs-tasks.amazonaws.com"
}
}
- Effect = "Allow"
- Resource = [
- "arn:aws:iam::256586139593:role/integrator-staging-ecs-task-execution-role",
- "arn:aws:iam::256586139593:role/integrator-staging-ecs-task-role",
]
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-launch-tasks-policy"
}
# (6 unchanged attributes hidden)
}
# module.ecs_background_runner.aws_ecs_task_definition.background_runner must be replaced
-/+ resource "aws_ecs_task_definition" "background_runner" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:145" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-background-runner" -> (known after apply)
~ revision = 145 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-background-runner"
}
# (10 unchanged attributes hidden)
}
Plan: 12 to add, 4 to change, 17 to destroy.
Changes to Outputs:
~ api_task_definition_arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:144" -> (known after apply)
~ ecs_task_definition_arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:145" -> (known after apply)
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Destroying... [id=integrator-dev-instances-role-20260212215001244800000002]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Destroying... [id=AllowExecutionFromEventBridge]
module.dev_launch_template.aws_iam_role_policy.ec2_tag_access: Destroying... [id=integrator-dev-instances-role:ec2-tag-access]
module.ecs_api.aws_ecs_task_definition.api: Destroying... [id=integrator-staging-api]
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Destroying... [id=integrator-dev-lambda-hibernate-role:hibernate-instances]
module.dev_launch_template.aws_launch_template.dev_instance: Destroying... [id=lt-06ad0f45df055e165]
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Destroying... [id=integrator-dev-idle-alarm-HibernateLambda]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Destroying... [id=integrator-dev-instances-role-20260212215001173200000001]
module.dev_launch_template.aws_lambda_permission.eventbridge[0]: Destruction complete after 0s
module.dev_launch_template.aws_iam_role_policy.ecr_access: Destroying... [id=integrator-dev-instances-role:ecr-access]
module.datadog_infra_alerts.datadog_monitor.api_5xx_logs: Creating...
module.dev_launch_template.aws_iam_role_policy.lambda_hibernate_instances[0]: Destruction complete after 0s
module.dev_launch_template.aws_iam_role_policy.ec2_tag_access: Destruction complete after 0s
module.ecs_api.aws_ecs_task_definition.poller: Destroying... [id=integrator-staging-poller]
module.dev_launch_template.aws_iam_role_policy_attachment.cloudwatch_agent: Destruction complete after 0s
module.dev_launch_template.aws_iam_role_policy_attachment.ssm_managed_instance: Destruction complete after 0s
module.dev_launch_template.aws_cloudwatch_event_target.lambda[0]: Destruction complete after 0s
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Destroying... [id=integrator-dev-auto-hibernate]
module.ecs_api.aws_ecs_task_definition.api: Destruction complete after 0s
module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed: Creating...
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Destroying... [id=integrator-dev-idle-alarm]
module.dev_launch_template.aws_iam_role_policy.ecr_access: Destruction complete after 0s
module.ecs_api.aws_ecs_task_definition.poller: Destruction complete after 0s
module.dev_launch_template.aws_launch_template.dev_instance: Destruction complete after 0s
module.dev_launch_template.aws_cloudwatch_event_rule.idle_alarm[0]: Destruction complete after 0s
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Destroying... [id=integrator-dev-instances-profile]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Destroying... [id=integrator-staging-background-runner]
module.datadog_infra_alerts.datadog_monitor.api_5xx_logs: Creation complete after 1s [id=258573132]
module.dev_launch_template.aws_security_group.dev_instances: Destroying... [id=sg-0bf64ade7c98a1da6]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Destruction complete after 1s
module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0]: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_502[0]: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_504[0]: Creating...
module.dev_launch_template.aws_lambda_function.auto_hibernate[0]: Destruction complete after 1s
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Destroying... [id=integrator-dev-lambda-hibernate-role]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Creating...
module.dev_launch_template.aws_iam_instance_profile.dev_instances: Destruction complete after 1s
module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed: Creation complete after 1s [id=258573133]
module.dev_launch_template.aws_iam_role.dev_instances: Destroying... [id=integrator-dev-instances-role]
module.dev_launch_template.aws_iam_role.lambda_hibernate[0]: Destruction complete after 0s
module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0]: Creation complete after 0s [id=258573134]
module.datadog_infra_alerts.datadog_monitor.alb_504[0]: Creation complete after 0s [id=258573135]
module.datadog_infra_alerts.datadog_monitor.alb_502[0]: Creation complete after 0s [id=258573136]
module.dev_launch_template.aws_iam_role.dev_instances: Destruction complete after 0s
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Creation complete after 0s [id=integrator-staging-background-runner]
module.ecs_api.aws_ecs_task_definition.poller: Creating...
module.ecs_api.aws_ecs_task_definition.api: Creating...
module.datadog_infra_alerts.datadog_monitor.workflow_memory_high: Creating...
module.ecs_api.aws_ecs_task_definition.poller: Creation complete after 0s [id=integrator-staging-poller]
module.ecs_api.aws_ecs_service.poller: Modifying... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_task_definition.api: Creation complete after 0s [id=integrator-staging-api]
module.ecs_api.aws_ecs_service.api: Modifying... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Modifying... [id=254279883]
module.datadog_infra_alerts.datadog_monitor.workflow_memory_high: Creation complete after 0s [id=258573137]
module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed: Creating...
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Modifications complete after 0s [id=254279883]
module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill: Creation complete after 0s [id=258573140]
module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed: Creation complete after 0s [id=258573142]
module.ecs_api.aws_ecs_service.poller: Modifications complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Modifications complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10s elapsed]
module.datadog_infra_alerts.datadog_monitor.workflow_timeout: Creating...
module.datadog_infra_alerts.datadog_monitor.workflow_timeout: Creation complete after 0s [id=258573163]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 15m0s elapsed]
Warning: Argument is deprecated
with module.datadog_dashboards.datadog_dashboard.cost,
on ../../modules/datadog-dashboards/cost.tf line 16, in resource "datadog_dashboard" "cost":
16: default = "*"
Use `defaults` instead.
(and 13 more similar warnings elsewhere)
Error: deleting Security Group (sg-0bf64ade7c98a1da6): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: 3ddb88c1-09ba-487a-a418-290f70707a5f, api error DependencyViolation: resource sg-0bf64ade7c98a1da6 has a dependent object
⚠️ Apply failed. Please check the output above.
...(earlier output truncated)...
eshing state... [id=254279883]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.datadog_infra_alerts.datadog_monitor.alb_502[0]: Refreshing state... [id=258573136]
module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0]: Refreshing state... [id=258573134]
module.datadog_infra_alerts.datadog_monitor.alb_504[0]: Refreshing state... [id=258573135]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 0s [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.datadog_infra_alerts.datadog_monitor.workflow_timeout: Refreshing state... [id=258573163]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.workflow_memory_high: Refreshing state... [id=258573137]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed: Refreshing state... [id=258573142]
module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill: Refreshing state... [id=258573140]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.dev_launch_template.aws_security_group.dev_instances will be destroyed
# (because aws_security_group.dev_instances is not in configuration)
- resource "aws_security_group" "dev_instances" {
- arn = "arn:aws:ec2:us-east-1:256586139593:security-group/sg-0bf64ade7c98a1da6" -> null
- description = "Security group for personal dev EC2 instances" -> null
- egress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "Allow all outbound traffic"
- from_port = 0
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "-1"
- security_groups = []
- self = false
- to_port = 0
},
] -> null
- id = "sg-0bf64ade7c98a1da6" -> null
- ingress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "SSH from anywhere (restricted by individual instance tags)"
- from_port = 22
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "tcp"
- security_groups = []
- self = false
- to_port = 22
},
] -> null
- name = "integrator-dev-instances" -> null
- owner_id = "256586139593" -> null
- revoke_rules_on_delete = false -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- vpc_id = "vpc-0529dea5160deb846" -> null
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api"
name = "integrator-staging-api"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:145" -> (known after apply)
# (16 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_service.poller will be updated in-place
~ resource "aws_ecs_service" "poller" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller"
name = "integrator-staging-poller"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:131" -> (known after apply)
# (16 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_task_definition.api must be replaced
-/+ resource "aws_ecs_task_definition" "api" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:145" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-api" -> (known after apply)
~ revision = 145 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_ecs_task_definition.poller must be replaced
-/+ resource "aws_ecs_task_definition" "poller" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:131" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-poller" -> (known after apply)
~ revision = 131 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_iam_policy.launch_background_runner will be updated in-place
~ resource "aws_iam_policy" "launch_background_runner" {
id = "arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy"
name = "integrator-staging-api-launch-tasks-policy"
~ policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "ecs:RunTask",
- "ecs:DescribeTasks",
- "ecs:StopTask",
- "ecs:ListTasks",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:*",
- "arn:aws:ecs:us-east-1:256586139593:task/default/*",
]
},
- {
- Action = [
- "iam:PassRole",
]
- Condition = {
- StringEquals = {
- "iam:PassedToService" = "ecs-tasks.amazonaws.com"
}
}
- Effect = "Allow"
- Resource = [
- "arn:aws:iam::256586139593:role/integrator-staging-ecs-task-execution-role",
- "arn:aws:iam::256586139593:role/integrator-staging-ecs-task-role",
]
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-launch-tasks-policy"
}
# (6 unchanged attributes hidden)
}
# module.ecs_background_runner.aws_ecs_task_definition.background_runner must be replaced
-/+ resource "aws_ecs_task_definition" "background_runner" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:146" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-background-runner" -> (known after apply)
~ revision = 146 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-background-runner"
}
# (10 unchanged attributes hidden)
}
Plan: 3 to add, 3 to change, 4 to destroy.
Changes to Outputs:
~ api_task_definition_arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:145" -> (known after apply)
~ ecs_task_definition_arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:146" -> (known after apply)
module.ecs_api.aws_ecs_task_definition.api: Destroying... [id=integrator-staging-api]
module.dev_launch_template.aws_security_group.dev_instances: Destroying... [id=sg-0bf64ade7c98a1da6]
module.ecs_api.aws_ecs_task_definition.poller: Destroying... [id=integrator-staging-poller]
module.ecs_api.aws_ecs_task_definition.api: Destruction complete after 0s
module.ecs_api.aws_ecs_task_definition.poller: Destruction complete after 0s
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Destroying... [id=integrator-staging-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Destruction complete after 0s
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Creating...
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Creation complete after 1s [id=integrator-staging-background-runner]
module.ecs_api.aws_ecs_task_definition.poller: Creating...
module.ecs_api.aws_ecs_task_definition.api: Creating...
module.ecs_api.aws_ecs_task_definition.poller: Creation complete after 0s [id=integrator-staging-poller]
module.ecs_api.aws_ecs_task_definition.api: Creation complete after 0s [id=integrator-staging-api]
module.ecs_api.aws_ecs_service.poller: Modifying... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Modifying... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_ecs_service.poller: Modifications complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Modifications complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 1m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 2m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 3m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 4m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 5m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 6m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 7m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 8m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 9m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 10m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 11m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 12m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 13m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m0s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m10s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m20s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m30s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m40s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 14m50s elapsed]
module.dev_launch_template.aws_security_group.dev_instances: Still destroying... [id=sg-0bf64ade7c98a1da6, 15m0s elapsed]
Warning: Argument is deprecated
with module.datadog_dashboards.datadog_dashboard.cost,
on ../../modules/datadog-dashboards/cost.tf line 16, in resource "datadog_dashboard" "cost":
16: default = "*"
Use `defaults` instead.
(and 13 more similar warnings elsewhere)
Error: deleting Security Group (sg-0bf64ade7c98a1da6): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: 403ffa69-6ada-4cbf-bd0e-367a3a349e1a, api error DependencyViolation: resource sg-0bf64ade7c98a1da6 has a dependent object
⚠️ Apply failed. Please check the output above.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
data.doppler_secrets.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
data.doppler_secrets.github: Read complete after 0s [id=integrator.prod]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.prod]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.private_subnets.data.aws_vpc.default: Reading...
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.private_subnets.data.aws_availability_zones.available: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_caller_identity.current: Reading...
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_vpc.selected: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.private_subnets.data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-02f058bc6b3ee6fcc]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
data.aws_vpc.default: Reading...
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
module.rds.data.aws_vpc.selected: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
data.aws_subnets.public: Reading...
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 1s [id=us-east-1]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.rds.aws_security_group_rule.bastion_proxy[0]: Refreshing state... [id=sgrule-417166805]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=306319572]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=3858118069]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 1s [id=1846264275]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 1s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
<= read (data resources)
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [PROD] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api"
name = "integrator-prod-api"
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
"Name" = "integrator-prod-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_vpc.selected: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
data.aws_vpc.default: Reading...
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
data.aws_iam_openid_connect_provider.github: Reading...
data.aws_iam_openid_connect_provider.github: Read complete after 0s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
data.aws_subnets.public: Reading...
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.staging]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.rds.data.aws_vpc.selected: Reading...
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4989053128215798288]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3D5*
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api status=5*\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_security_group.dev_instances will be destroyed
# (because aws_security_group.dev_instances is not in configuration)
- resource "aws_security_group" "dev_instances" {
- arn = "arn:aws:ec2:us-east-1:256586139593:security-group/sg-0bf64ade7c98a1da6" -> null
- description = "Security group for personal dev EC2 instances" -> null
- egress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "Allow all outbound traffic"
- from_port = 0
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "-1"
- security_groups = []
- self = false
- to_port = 0
},
] -> null
- id = "sg-0bf64ade7c98a1da6" -> null
- ingress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "SSH from anywhere (restricted by individual instance tags)"
- from_port = 22
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "tcp"
- security_groups = []
- self = false
- to_port = 22
},
] -> null
- name = "integrator-dev-instances" -> null
- owner_id = "256586139593" -> null
- revoke_rules_on_delete = false -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- vpc_id = "vpc-0529dea5160deb846" -> null
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api"
name = "integrator-staging-api"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:147" -> (known after apply)
# (16 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_service.poller will be updated in-place
~ resource "aws_ecs_service" "poller" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller"
name = "integrator-staging-poller"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:133" -> (known after apply)
# (16 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_task_definition.api must be replaced
-/+ resource "aws_ecs_task_definition" "api" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:147" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-api" -> (known after apply)
~ revision = 147 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_ecs_task_definition.poller must be replaced
-/+ resource "aws_ecs_task_definition" "poller" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:133" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-poller" -> (known after apply)
~ revision = 133 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_iam_policy.launch_background_runner will be updated in-place
~ resource "aws_iam_policy" "launch_background_runner" {
id = "arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy"
name = "integrator-staging-api-launch-tasks-policy"
~ policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "ecs:RunTask",
- "ecs:DescribeTasks",
- "ecs:StopTask",
- "ecs:ListTasks",
]
- Effect = "Allow"
- Resource = [
- "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-background-runner:*",
- "arn:aws:ecs:us-east-1:256586139593:task/default/*",
]
},
- {
- Action = [
- "iam:PassRole",
]
- Condition = {
- StringEquals = {
- "iam:PassedToService" = "ecs-tasks.amazon
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
No saved plan found — refusing to apply without a plan file.
⚠️ Apply failed. Please check the output above.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
data.aws_vpc.default: Reading...
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.ecs_api.data.aws_vpc.selected: Reading...
module.doppler_aws_secrets.data.external.env_vars: Read complete after 1s [id=-]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.data.aws_caller_identity.current: Reading...
data.aws_iam_openid_connect_provider.github: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
data.aws_iam_openid_connect_provider.github: Read complete after 0s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 0s [id=integrator-api]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
data.aws_subnets.public: Reading...
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.data.aws_subnets.public: Reading...
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 1s [id=integrator.staging]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.rds.data.aws_vpc.selected: Reading...
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["GH_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.GH_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4154711961082380403]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
<= read (data resources)
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20%22status%3D5%22
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api \\\"status=5\\\"\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_security_group.dev_instances will be destroyed
# (because aws_security_group.dev_instances is not in configuration)
- resource "aws_security_group" "dev_instances" {
- arn = "arn:aws:ec2:us-east-1:256586139593:security-group/sg-0bf64ade7c98a1da6" -> null
- description = "Security group for personal dev EC2 instances" -> null
- egress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "Allow all outbound traffic"
- from_port = 0
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "-1"
- security_groups = []
- self = false
- to_port = 0
},
] -> null
- id = "sg-0bf64ade7c98a1da6" -> null
- ingress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "SSH from anywhere (restricted by individual instance tags)"
- from_port = 22
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "tcp"
- security_groups = []
- self = false
- to_port = 22
},
] -> null
- name = "integrator-dev-instances" -> null
- owner_id = "256586139593" -> null
- revoke_rules_on_delete = false -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- vpc_id = "vpc-0529dea5160deb846" -> null
}
# module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced will be read during apply
# (depends on a resource or a module with changes pending)
<= data "aws_secretsmanager_secrets" "synced" {
+ arns = (known after apply)
+ id = (known after apply)
+ names = (known after apply)
+ filter {
+ name = "name"
+ values = [
+ "doppler/integrator/staging/",
]
}
}
# module.doppler_aws_secrets.doppler_secret.secrets["GH_WEBHOOK_SECRET"] will be destroyed
# (because key ["GH_WEBHOOK_SECRET"] is not in for_each map)
- resource "doppler_secret" "secrets" {
- computed = (sensitive value) -> null
- config = "staging" -> null
- id = "integrator.staging.GH_WEBHOOK_SECRET" -> null
- name = "GH_WEBHOOK_SECRET" -> null
- project = "integrator" -> null
- value = (sensitive value) -> null
- value_type = "string" -> null
- visibility = "masked" -> null
}
# module.doppler_aws_secrets.null_resource.wait_for_secrets must be replaced
-/+ resource "null_resource" "wait_for_secrets" {
~ id = "4154711961082380403" -> (known after apply)
~ triggers = { # forces replacement
~ "secret_count" = "57" -> "56"
# (2 unchanged elements hidden)
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api"
name = "integrator-staging-api"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:150" -> (known after apply)
# (16 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_service.poller will be updated in-place
~ resource "aws_ecs_service" "poller" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller"
name = "integrator-staging-poller"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:136" -> (known after apply)
# (16 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_task_definition.api must be replaced
-/+ resource "aws_ecs_task_definition" "api" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:150" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-api" -> (known after apply)
~ revision = 150 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_ecs_task_definition.poller must be replaced
-/+ resource "aws_ecs_task_definition" "poller" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:136" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-poller" -> (known after apply)
~ revision = 136 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-s
... (output truncated)
🔵 Comment /apply-staging to apply these changes.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
module.doppler_aws_secrets.data.external.env_vars: Reading...
data.doppler_secrets.github: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
data.doppler_secrets.github: Read complete after 0s [id=integrator.prod]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.datadog_watchdog_monitors.datadog_monitor.fixable_error_rate_anomaly: Refreshing state... [id=258866090]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.datadog_watchdog_monitors.datadog_monitor.batch_poller_heartbeat: Refreshing state... [id=258866091]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.prod]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_watchdog_monitors.datadog_monitor.fixable_error_rate_tiered: Refreshing state... [id=259161122]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.ecs_api.data.aws_vpc.selected: Reading...
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.private_subnets.data.aws_availability_zones.available: Reading...
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.private_subnets.data.aws_vpc.default: Reading...
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.private_subnets.data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-02f058bc6b3ee6fcc]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
data.aws_vpc.default: Reading...
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.rds.data.aws_vpc.selected: Reading...
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
data.aws_subnets.public: Reading...
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
module.rds.aws_security_group_rule.bastion_proxy[0]: Refreshing state... [id=sgrule-417166805]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 1s [id=us-east-1]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=3797455924]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=2585308447]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
<= read (data resources)
Terraform will perform the following actions:
# module.datadog_aws_integration.aws_iam_policy.datadog[3] will be updated in-place
~ resource "aws_iam_policy" "datadog" {
id = "arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4"
name = "DatadogIntegrationPolicy-prod-4"
~ policy = jsonencode(
~ {
~ Statement = [
~ {
~ Action = [
- "wisdom:ListQuickResponses",
- "wisdom:ListKnowledgeBases",
"wisdom:ListContents",
# (147 unchanged elements hidden)
"s3:ListAllMyBuckets",
+ "s3:GetIntelligentTieringConfiguration",
"s3:GetBucketTagging",
# (44 unchanged elements hidden)
]
# (2 unchanged attributes hidden)
},
]
# (1 unchanged attribute hidden)
}
)
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
}
# (6 unchanged attributes hidden)
}
# module.datadog_aws_integration.aws_iam_policy.datadog[4] will be updated in-place
~ resource "aws_iam_policy" "datadog" {
id = "arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5"
name = "DatadogIntegrationPolicy-prod-5"
~ policy = jsonencode(
~ {
~ Statement = [
~ {
~ Action = [
# (20 unchanged elements hidden)
"workmail:DescribeOrganization",
+ "wisdom:ListQuickResponses",
+ "wisdom:ListKnowledgeBases",
]
# (2 unchanged attributes hidden)
},
]
# (1 unchanged attribute hidden)
}
)
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
}
# (6 unchanged attributes hidden)
}
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20%22status%3D5%22
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api \\\"status=5\\\"\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after ap
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.prod]
data.doppler_secrets.github: Reading...
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.github_actions_role.data.tls_certificate.github: Reading...
module.rds.tls_private_key.bastion[0]: Refreshing state... [id=b458af7b383ee8f3e3fcd9aeaac04b321158c3eb]
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=nUNRxvKsTYaNYc3mOIp8qQ]
module.datadog_watchdog_monitors.datadog_monitor.fixable_error_rate_tiered: Refreshing state... [id=259161122]
module.datadog_watchdog_monitors.datadog_monitor.batch_poller_heartbeat: Refreshing state... [id=258866091]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253047644]
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Reading...
module.datadog_aws_integration.data.datadog_integration_aws_available_namespaces.available: Read complete after 0s [id=integration-aws-available-namespaces]
module.datadog_watchdog_monitors.datadog_monitor.fixable_error_rate_anomaly: Refreshing state... [id=258866090]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-prod]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253047645]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=ZAg6PqW4QYS9vFG1VbVz-g]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Reading...
data.doppler_secrets.github: Read complete after 0s [id=integrator.prod]
module.datadog_aws_integration.data.datadog_integration_aws_iam_permissions.permissions: Read complete after 0s [id=integration-aws-iam-permissions]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-prod]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/prod/poller]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Tr5j8R9oTUmYINowsAYAcw]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password]
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_vpc.selected: Reading...
module.rds.aws_secretsmanager_secret.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899]
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/prod/migrations]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-ecs-exec-policy]
module.bedrock_profiles.data.aws_region.current: Reading...
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.data.aws_ami.amazon_linux[0]: Reading...
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-prod-migrations-lambda-role]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-prod-postgres-20251121153637932500000001]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/prod/api]
module.github_actions_role.aws_iam_openid_connect_provider.github[0]: Refreshing state... [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 1s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-prod-migrations]
module.rds.aws_key_pair.bastion[0]: Refreshing state... [id=integrator-prod-bastion-key]
module.rds.aws_iam_role.bastion[0]: Refreshing state... [id=integrator-prod-bastion-role]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl]
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-doppler-secrets-manager-policy]
module.rds.aws_iam_role.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role]
module.ecs_api.aws_ecr_repository.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.data.aws_vpc.selected: Read complete after 1s [id=vpc-0529dea5160deb846]
module.private_subnets.data.aws_availability_zones.available: Reading...
module.ecs_api.data.aws_caller_identity.current: Reading...
module.ecs_api.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-prod-migrations]
module.private_subnets.data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-prod-api-task-role]
module.private_subnets.data.aws_vpc.default: Reading...
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
github_actions_secret.gh_access_token: Refreshing state... [id=integrator:GH_ACCESS_TOKEN]
module.rds.aws_secretsmanager_secret_version.bastion_ssh_key[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/bastion-ssh-key-1KH899|terraform-20251209070355334400000002]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/prod/rds-readonly-password|terraform-20260113102854430300000001]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-prod]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-prod]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-prod]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-prod]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.rds.data.aws_ami.amazon_linux[0]: Read complete after 1s [id=ami-02f058bc6b3ee6fcc]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=f8180e75-b756-44ea-94f0-ba14f3f34cf3]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-prod]
module.private_subnets.data.aws_vpc.default: Read complete after 0s [id=vpc-0529dea5160deb846]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613044800000003]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.prod]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-prod-migrations-lambda-role-20260204170613005600000002]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/prod/rds-master-password-eUwfvl|terraform-20260102104323671400000002]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-prod-doppler-secrets-manager-role-20251117114311112900000001]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-01a98c3baf47519e1]
module.ecs_api.aws_security_group.vpc_endpoints[0]: Refreshing state... [id=sg-08812bf3835fa7025]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-prod-api-tg/480bcba81e9b7bc4]
module.rds.aws_iam_role_policy_attachment.bastion_ssm[0]: Refreshing state... [id=integrator-prod-bastion-role-20251209065339934800000002]
module.rds.aws_iam_instance_profile.bastion[0]: Refreshing state... [id=integrator-prod-bastion-profile]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-bedrock-access]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-prod-api-task-execution-role-20251119130129752900000001]
module.rds.aws_iam_role_policy_attachment.rds_monitoring[0]: Refreshing state... [id=integrator-prod-rds-monitoring-role-20251121100809599300000002]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-prod-migrations]
module.ecs_api.aws_ecr_lifecycle_policy.api[0]: Refreshing state... [id=integrator-api]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-prod-api-task-role-20260110143245125100000001]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=958j2gt23zd8]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=enps8f03v7hs]
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=8o8r6d38y4zb]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-prod-alb-logs]
module.datadog_aws_integration.datadog_integration_aws_account.this: Refreshing state... [id=f7db3959-cf79-4150-a3bf-411d58e829e7]
module.private_subnets.aws_subnet.private[1]: Refreshing state... [id=subnet-0e6efa013fe2ac723]
module.private_subnets.aws_route_table.private: Refreshing state... [id=rtb-0cec1fb4f58eaade7]
module.private_subnets.aws_subnet.private[0]: Refreshing state... [id=subnet-0c9ccc78412c4983e]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.prod.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.prod.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.prod.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.prod.ANCHOR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=2mg-shi-xe5]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.prod.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.prod.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.prod.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.prod.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.prod.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.prod.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.prod.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.prod.THERMOS_HOST]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=sg9-zyk-4r7]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.prod.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.prod.SLACK_BOT_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.prod.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.prod.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.prod.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.prod.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.prod.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.prod.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.prod.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.prod.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.prod.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.prod.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.prod.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.prod.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.prod.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.prod.VERCEL_AI_GATEWAY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.prod.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.prod.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.prod.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.prod.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.prod.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.prod.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.prod.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.prod.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.prod.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.prod.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.prod.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.prod.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.prod.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.prod.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.prod.THERMOS_PASSWORD]
module.ecs_api.aws_vpc_endpoint.s3[0]: Refreshing state... [id=vpce-01dcb60025fcb1af7]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-0b340dddce9618955]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-cloudwatch-logs-policy]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-prod-20260115155059879900000001]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-prod-api-task-role-20260209135620314900000002]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod]
module.private_subnets.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-004db5cb1a7f82fe4]
data.aws_vpc.default: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_assume_role: Read complete after 0s [id=3755515486]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-prod-alb/a90df99d7efaafb4]
module.private_subnets.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0965ac73c1c272e6d]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_PROD]
module.ecs_api.aws_vpc_endpoint.secretsmanager[0]: Refreshing state... [id=vpce-03bbad757c880b926]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=dd32cab2-43b9-4982-8608-0d9e7ef07008]
module.rds.aws_security_group.rds_client[0]: Refreshing state... [id=sg-0a5f38e269947dd2b]
module.rds.data.aws_vpc.selected: Reading...
module.rds.aws_security_group.bastion[0]: Refreshing state... [id=sg-00fec6e131a7856da]
module.datadog_aws_integration.aws_iam_role.datadog: Refreshing state... [id=DatadogIntegrationRole-prod]
data.aws_subnets.public: Reading...
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_PROD]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=253204905]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-prod-alb/a90df99d7efaafb4/0c03b8bc33aef8d0]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-prod-db-subnet-group]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=204540429253022221]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-prod:integrator-ai-agent-debugging-prod-cost-explorer-policy]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0d7ecef877697558c]
module.clickhouse_privatelink.aws_security_group.clickhouse_vpce: Refreshing state... [id=sg-034330929d9d990c1]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_PROD]
module.rds.aws_security_group_rule.bastion_proxy[0]: Refreshing state... [id=sgrule-417166805]
module.datadog_aws_integration.aws_iam_role_policy_attachment.security_audit: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458727200000001]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Reading...
github_actions_secret.aws_debugging_role_arn: Refreshing state... [id=integrator:AWS_DEBUGGING_ROLE_ARN_PROD]
module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced: Read complete after 0s [id=us-east-1]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=253204904]
module.rds.aws_instance.bastion[0]: Refreshing state... [id=i-0342e072cbf34ad16]
module.rds.aws_db_instance.main: Refreshing state... [id=db-PPFVILK5DKEQIIFO47ASSRDD2A]
module.clickhouse_privatelink.aws_vpc_endpoint.clickhouse: Refreshing state... [id=vpce-042be8f500c67fba7]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=253205240]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=253205236]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=1172274362782256699]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-prod-migrations-lambda-role:integrator-prod-migrations-lambda-access]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=253205239]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=253205235]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=253205238]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=253205241]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=253205237]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-prod-migrations]
module.clickhouse_privatelink.clickhouse_service_private_endpoints_attachment.this: Refreshing state...
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-secrets-access]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-prod-migrations]
module.rds.aws_eip.bastion[0]: Refreshing state... [id=eipalloc-0e0ae9a6a950dc2d6]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-prod-api-task-execution-role-20251127102314997400000001]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role]
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/prod/background-runner]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/prod/ecs-task-state-change]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-secrets-access]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-s3-knowledge-access]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-bedrock-access]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-ecs-exec-policy]
module.ecs_background_runner.aws_ecr_repository.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-prod-ecs-task-role]
module.ecs_background_runner.aws_ecs_cluster.cluster[0]: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727612500000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-prod-ecs-task-execution-role-20251117132727872900000002]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-prod-ecs-task-state-change]
module.ecs_background_runner.aws_ecr_lifecycle_policy.background_runner[0]: Refreshing state... [id=integrator-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecr-policy]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-prod-background-runner]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260209135620251900000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-prod-ecs-task-role-20251118121957970700000001]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-prod-ecs-task-role-20260110150021566200000001]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[0]: Read complete after 0s [id=616917803]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Reading...
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[4]: Read complete after 0s [id=3797455924]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[3]: Read complete after 0s [id=2585308447]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[1]: Read complete after 0s [id=1846264275]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-prod-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-prod-poller]
module.datadog_aws_integration.data.aws_iam_policy_document.datadog_permissions[2]: Read complete after 0s [id=4005895420]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-prod-ecs-task-state-change-LogTaskStateChanges]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=253205244]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-prod-api]
module.datadog_aws_integration.aws_iam_policy.datadog[2]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-3]
module.datadog_aws_integration.aws_iam_policy.datadog[3]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4]
module.datadog_aws_integration.aws_iam_policy.datadog[4]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5]
module.datadog_aws_integration.aws_iam_policy.datadog[0]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-1]
module.datadog_aws_integration.aws_iam_policy.datadog[1]: Refreshing state... [id=arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-2]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-prod-api-task-role-20251119130129778400000002]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-poller]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-prod-api]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=253205262]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[3]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458830300000004]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[2]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458869600000005]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[0]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458870300000006]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=253205259]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=253205261]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[1]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458823800000002]
module.datadog_aws_integration.aws_iam_role_policy_attachment.datadog[4]: Refreshing state... [id=DatadogIntegrationRole-prod-20260122200458829600000003]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-prod:integrator-github-actions-prod-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=253205257]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=253205260]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=253205258]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
-/+ destroy and then create replacement
<= read (data resources)
Terraform will perform the following actions:
# module.datadog_aws_integration.aws_iam_policy.datadog[3] will be updated in-place
~ resource "aws_iam_policy" "datadog" {
id = "arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-4"
name = "DatadogIntegrationPolicy-prod-4"
~ policy = jsonencode(
~ {
~ Statement = [
~ {
~ Action = [
- "wisdom:ListQuickResponses",
- "wisdom:ListKnowledgeBases",
"wisdom:ListContents",
# (147 unchanged elements hidden)
"s3:ListAllMyBuckets",
+ "s3:GetIntelligentTieringConfiguration",
"s3:GetBucketTagging",
# (44 unchanged elements hidden)
]
# (2 unchanged attributes hidden)
},
]
# (1 unchanged attribute hidden)
}
)
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
}
# (6 unchanged attributes hidden)
}
# module.datadog_aws_integration.aws_iam_policy.datadog[4] will be updated in-place
~ resource "aws_iam_policy" "datadog" {
id = "arn:aws:iam::256586139593:policy/DatadogIntegrationPolicy-prod-5"
name = "DatadogIntegrationPolicy-prod-5"
~ policy = jsonencode(
~ {
~ Statement = [
~ {
~ Action = [
# (20 unchanged elements hidden)
"workmail:DescribeOrganization",
+ "wisdom:ListQuickResponses",
+ "wisdom:ListKnowledgeBases",
]
# (2 unchanged attributes hidden)
},
]
# (1 unchanged attribute hidden)
}
)
tags = {
"Environment" = "prod"
"ManagedBy" = "Terraform"
}
# (6 unchanged attributes hidden)
}
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "253204904"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [PROD] ALB 5xx Errors Critical
+ :rotating_light: [PROD] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [PROD] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
~ name = "[PROD] ALB 5xx Error Rate High" -> "[PROD] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
tags = [
"Environment:prod",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:prod",
"resource:app/integrator-prod-alb/a90df99d7efaafb4",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-prod-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20%22status%3D5%22
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api \\\"status=5\\\"\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-prod-alb/a90df99d7efaafb4
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [PROD] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-prod-alb/a90df99d7efaafb4} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:prod",
+ "resource:app/integrator-prod-alb/a90df99d7efaafb4",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [PROD] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:prod",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [PROD] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [PROD] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-prod-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [PROD] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-prod-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-prod-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-prod-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [PROD] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-prod-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-prod-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-prod-alerts
EOT
+ name = "[PROD] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "1"
+ query = "logs(\"service:integrator-prod-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:prod",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:prod",
+ "resource:integrator-prod-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after ap
... (output truncated)
⚠️ Production changes will be applied after merge to next branch.
ℹ️ Note: The
Plan: 3 to add, 3 to change, 3 to destroy.output is always expected due to a known Terraform AWS provider issue. ECS task definitions show asmust be replacedon every plan because the provider stores container definitions in a different order than AWS returns them. This is cosmetic only - real changes (new images, env vars) are still detected correctly.
module.rds.random_password.readonly[0]: Refreshing state... [id=none]
module.rds.random_password.master[0]: Refreshing state... [id=none]
module.doppler_aws_secrets.data.external.env_vars: Reading...
module.github_actions_role.data.tls_certificate.github: Reading...
module.doppler_aws_secrets.doppler_environment.config: Refreshing state... [id=integrator.staging]
module.doppler_aws_secrets.data.doppler_secrets.existing: Reading...
module.github_actions_role.data.tls_certificate.github: Read complete after 0s [id=772ed8785f2c647baa040d3a1b4aa6cafacb6267]
module.ecs_api.aws_s3_bucket.migrations: Refreshing state... [id=integrator-staging-migrations]
module.doppler_aws_secrets.data.external.env_vars: Read complete after 0s [id=-]
module.bedrock_profiles.data.aws_region.current: Reading...
module.ecs_api.aws_iam_role.migrations_lambda: Refreshing state... [id=integrator-staging-migrations-lambda-role]
module.dev_launch_template.aws_security_group.dev_instances: Refreshing state... [id=sg-0bf64ade7c98a1da6]
module.ecs_api.aws_iam_role.api_task: Refreshing state... [id=integrator-staging-api-task-role]
module.doppler_aws_secrets.aws_iam_role.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role]
module.ecs_api.aws_s3_bucket.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_cloudwatch_log_group.migrations_lambda_logs: Refreshing state... [id=/aws/lambda/integrator-staging-migrations]
module.bedrock_profiles.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.rds.aws_secretsmanager_secret.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password]
module.rds.aws_secretsmanager_secret.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0]
module.doppler_aws_secrets.data.doppler_secrets.existing: Read complete after 0s [id=integrator.staging]
module.artifacts_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-artifacts-staging]
module.rds.data.aws_region.current: Reading...
module.rds.data.aws_region.current: Read complete after 0s [id=us-east-1]
module.knowledge_bucket.aws_s3_bucket.this: Refreshing state... [id=integrator-knowledge-staging]
module.ecs_api.aws_cloudwatch_log_group.api_logs: Refreshing state... [id=/ecs/integrator/staging/api]
module.ecs_api.data.aws_ecr_repository.api[0]: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Reading...
data.aws_iam_openid_connect_provider.github: Reading...
module.ai_agent_debugging_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-ecs-exec-policy]
module.ecs_api.aws_cloudwatch_log_group.poller_logs: Refreshing state... [id=/ecs/integrator/staging/poller]
module.ecs_api.data.aws_elb_service_account.main: Reading...
module.ecs_api.data.aws_elb_service_account.main: Read complete after 0s [id=127311923021]
module.ecs_api.data.aws_caller_identity.current: Reading...
module.doppler_aws_secrets.aws_iam_policy.doppler_secrets_manager: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-doppler-secrets-manager-policy]
module.ecs_api.data.aws_caller_identity.current: Read complete after 1s [id=256586139593]
module.ecs_api.data.aws_vpc.selected: Reading...
data.aws_iam_openid_connect_provider.github: Read complete after 1s [id=arn:aws:iam::256586139593:oidc-provider/token.actions.githubusercontent.com]
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_api.aws_iam_role.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role]
module.rds.aws_db_parameter_group.main: Refreshing state... [id=integrator-staging-postgres-20251121173128166300000001]
module.ecs_api.data.aws_ecs_cluster.default: Reading...
module.ecs_api.aws_cloudwatch_log_group.migrations_logs: Refreshing state... [id=/ecs/integrator/staging/migrations]
module.github_actions_role.data.aws_caller_identity.current: Reading...
module.github_actions_role.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
data.aws_vpc.default: Reading...
module.datadog_log_pipeline.datadog_apm_retention_filter.error_traces: Refreshing state... [id=h17q4OSSSEie8lqpAgBYyw]
module.ecs_api.data.aws_ecs_cluster.default: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_api.data.aws_ecr_repository.api[0]: Read complete after 1s [id=integrator-api]
module.ecs_api.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.datadog_monitors.datadog_monitor.error_tracking_new_issues: Refreshing state... [id=253065812]
module.datadog_monitors.datadog_monitor.all_errors_log[0]: Refreshing state... [id=253065817]
module.datadog_log_pipeline.datadog_logs_custom_pipeline.integrator_json: Refreshing state... [id=Hfi5qZBoQXq00jWU_RovcA]
module.datadog_log_pipeline.datadog_logs_index.integrator: Refreshing state... [id=integrator-staging]
module.datadog_log_pipeline.datadog_apm_retention_filter.llmobs_traces: Refreshing state... [id=-mP2s46hR6yaYgMyEICDtg]
data.aws_vpc.default: Read complete after 1s [id=vpc-0529dea5160deb846]
module.rds.aws_secretsmanager_secret_version.readonly_password[0]: Refreshing state... [id=integrator/staging/rds-readonly-password|terraform-20260115132739255000000001]
module.rds.aws_secretsmanager_secret_version.master_password[0]: Refreshing state... [id=arn:aws:secretsmanager:us-east-1:256586139593:secret:integrator/staging/rds-master-password-4QWym0|terraform-20251128130346365300000002]
module.doppler_aws_secrets.doppler_integration_aws_secrets_manager.integration: Refreshing state... [id=44a390e7-e35d-44e3-a573-6476e776f406]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_basic: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948100000001]
module.ecs_api.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-bedrock-access]
module.ecs_api.aws_iam_role_policy_attachment.migrations_lambda_vpc: Refreshing state... [id=integrator-staging-migrations-lambda-role-20260204171019948800000002]
module.ecs_api.aws_iam_role_policy_attachment.api_ecs_exec: Refreshing state... [id=integrator-staging-api-task-role-20260110152557084500000001]
module.ai_agent_debugging_role.aws_iam_role.ai_agent_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging]
module.doppler_aws_secrets.aws_iam_role_policy_attachment.doppler_secrets_manager: Refreshing state... [id=integrator-staging-doppler-secrets-manager-role-20251117112527179200000001]
module.ecs_api.aws_iam_role_policy_attachment.api_task_execution: Refreshing state... [id=integrator-staging-api-task-execution-role-20251121173131958900000005]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SEATGEEK_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SEATGEEK_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_DATABASE"]: Refreshing state... [id=integrator.staging.THERMOS_DATABASE]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_YELP_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_YELP_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_SERPAPI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_SERPAPI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.COMPOSIO_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_REVIEWER_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_REVIEWER_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["ANCHOR_API_KEY"]: Refreshing state... [id=integrator.staging.ANCHOR_API_KEY]
module.datadog_dashboards.datadog_dashboard.workflow_debugging[0]: Refreshing state... [id=ea3-u45-7yc]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_SECRET"]: Refreshing state... [id=integrator.staging.TWITTER_API_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["TWITTER_API_KEY"]: Refreshing state... [id=integrator.staging.TWITTER_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["TOOLS_DATABASE_URL"]: Refreshing state... [id=integrator.staging.TOOLS_DATABASE_URL]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_TAVILY_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_TAVILY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["PYTHONPATH"]: Refreshing state... [id=integrator.staging.PYTHONPATH]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_USERNAME"]: Refreshing state... [id=integrator.staging.THERMOS_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["DD_API_KEY"]: Refreshing state... [id=integrator.staging.DD_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_ENDPOINT_URL"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_ENDPOINT_URL]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PASSWORD"]: Refreshing state... [id=integrator.staging.THERMOS_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_PASSWORD"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_PASSWORD]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_MAX_FIXERS_PER_DAY"]: Refreshing state... [id=integrator.staging.WATCHDOG_MAX_FIXERS_PER_DAY]
module.doppler_aws_secrets.doppler_secret.secrets["GITHUB_ACCESS_TOKEN"]: Refreshing state... [id=integrator.staging.GITHUB_ACCESS_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["VERCEL_AI_GATEWAY_API_KEY"]: Refreshing state... [id=integrator.staging.VERCEL_AI_GATEWAY_API_KEY]
module.datadog_dashboards.datadog_dashboard.cost[0]: Refreshing state... [id=dcn-p2a-rya]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_GEMINI_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_GEMINI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_API_KEY"]: Refreshing state... [id=integrator.staging.LANGSMITH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_PORT"]: Refreshing state... [id=integrator.staging.THERMOS_PORT]
module.doppler_aws_secrets.doppler_secret.secrets["GROQ_API_KEY"]: Refreshing state... [id=integrator.staging.GROQ_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["MERCURY_AUTOLOAD"]: Refreshing state... [id=integrator.staging.MERCURY_AUTOLOAD]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_URL"]: Refreshing state... [id=integrator.staging.APOLLO_URL]
module.doppler_aws_secrets.doppler_secret.secrets["ENCRYPTION_KEY"]: Refreshing state... [id=integrator.staging.ENCRYPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["DISABLE_MERCURY_CLEANUP"]: Refreshing state... [id=integrator.staging.DISABLE_MERCURY_CLEANUP]
module.doppler_aws_secrets.doppler_secret.secrets["STATSD_METRICS_FLUSH_WAIT_TIME_SECS"]: Refreshing state... [id=integrator.staging.STATSD_METRICS_FLUSH_WAIT_TIME_SECS]
module.doppler_aws_secrets.doppler_secret.secrets["WATCHDOG_BATCH_POLLER_DISABLED"]: Refreshing state... [id=integrator.staging.WATCHDOG_BATCH_POLLER_DISABLED]
module.doppler_aws_secrets.doppler_secret.secrets["PIPEDREAM_PATH"]: Refreshing state... [id=integrator.staging.PIPEDREAM_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["CLICKHOUSE_USERNAME"]: Refreshing state... [id=integrator.staging.CLICKHOUSE_USERNAME]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_TRACING"]: Refreshing state... [id=integrator.staging.LANGSMITH_TRACING]
module.doppler_aws_secrets.doppler_secret.secrets["PERPLEXITY_API_KEY"]: Refreshing state... [id=integrator.staging.PERPLEXITY_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["HELICONE_API_KEY"]: Refreshing state... [id=integrator.staging.HELICONE_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["OPENAI_API_KEY"]: Refreshing state... [id=integrator.staging.OPENAI_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["GH_WEBHOOK_SECRET"]: Refreshing state... [id=integrator.staging.GH_WEBHOOK_SECRET]
module.doppler_aws_secrets.doppler_secret.secrets["APOLLO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.APOLLO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["BACKEND_READONLY_URL"]: Refreshing state... [id=integrator.staging.BACKEND_READONLY_URL]
module.doppler_aws_secrets.doppler_secret.secrets["LANGSMITH_ENDPOINT"]: Refreshing state... [id=integrator.staging.LANGSMITH_ENDPOINT]
module.doppler_aws_secrets.doppler_secret.secrets["BRANDFETCH_API_KEY"]: Refreshing state... [id=integrator.staging.BRANDFETCH_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["AZURE_OPENAI_SUBSCRIPTION_KEY"]: Refreshing state... [id=integrator.staging.AZURE_OPENAI_SUBSCRIPTION_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_INSTACART_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_INSTACART_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_EXA_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_EXA_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_API_KEY"]: Refreshing state... [id=integrator.staging.COMPOSIO_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["THERMOS_HOST"]: Refreshing state... [id=integrator.staging.THERMOS_HOST]
module.doppler_aws_secrets.doppler_secret.secrets["FIRECRAWL_API_KEY"]: Refreshing state... [id=integrator.staging.FIRECRAWL_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["LINEAR_API_KEY"]: Refreshing state... [id=integrator.staging.LINEAR_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ANTHROPIC_API_KEY"]: Refreshing state... [id=integrator.staging.ANTHROPIC_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["ZAPIER_PATH"]: Refreshing state... [id=integrator.staging.ZAPIER_PATH]
module.doppler_aws_secrets.doppler_secret.secrets["NOTION_API_KEY"]: Refreshing state... [id=integrator.staging.NOTION_API_KEY]
module.doppler_aws_secrets.doppler_secret.secrets["USE_APOLLO_GET_DOWNLOAD_URL_API"]: Refreshing state... [id=integrator.staging.USE_APOLLO_GET_DOWNLOAD_URL_API]
module.doppler_aws_secrets.doppler_secret.secrets["TEST_CONNECTION_ID"]: Refreshing state... [id=integrator.staging.TEST_CONNECTION_ID]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_STAGING_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_STAGING_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["COMPOSIO_ADMIN_TOKEN"]: Refreshing state... [id=integrator.staging.COMPOSIO_ADMIN_TOKEN]
module.doppler_aws_secrets.doppler_secret.secrets["SLACK_BOT_TOKEN"]: Refreshing state... [id=integrator.staging.SLACK_BOT_TOKEN]
module.ecs_api.aws_s3_bucket_public_access_block.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.data.aws_route_tables.main: Reading...
module.ecs_api.data.aws_subnets.public: Reading...
module.ecs_api.aws_lb_target_group.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:targetgroup/integrator-staging-api-tg/4ebc38c1a3c18f91]
module.ecs_api.aws_security_group.alb: Refreshing state... [id=sg-0f268d8e0027eaad9]
module.artifacts_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.data.aws_route_tables.main: Read complete after 0s [id=us-east-1]
module.artifacts_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-artifacts-staging]
module.artifacts_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-artifacts-staging]
module.ecs_api.data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.ecs_api.aws_s3_bucket_lifecycle_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_public_access_block.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.ecs_api.aws_s3_bucket_server_side_encryption_configuration.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.knowledge_bucket.aws_s3_bucket_lifecycle_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_public_access_block.this: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_versioning.this[0]: Refreshing state... [id=integrator-knowledge-staging]
module.knowledge_bucket.aws_s3_bucket_server_side_encryption_configuration.this: Refreshing state... [id=integrator-knowledge-staging]
data.aws_subnets.public: Reading...
module.bedrock_profiles.aws_bedrock_inference_profile.haiku: Refreshing state... [id=71dmct7za78p]
module.bedrock_profiles.aws_bedrock_inference_profile.sonnet: Refreshing state... [id=fg4ld4a7fkp7]
module.bedrock_profiles.aws_bedrock_inference_profile.opus: Refreshing state... [id=1zfxsu353ckg]
data.aws_subnets.public: Read complete after 0s [id=us-east-1]
module.github_actions_role.aws_iam_role.github_actions: Refreshing state... [id=integrator-github-actions-staging]
module.ecs_api.aws_iam_role_policy_attachment.api_bedrock_access: Refreshing state... [id=integrator-staging-api-task-role-20260204130051288600000001]
module.ai_agent_debugging_role.aws_iam_role_policy.iam_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-iam-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.vpc_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-vpc-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_metrics_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-metrics-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecr_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecr-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.alb_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-alb-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.rds_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-rds-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cloudwatch_logs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cloudwatch-logs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.secrets_manager_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-secrets-manager-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.ecs_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-ecs-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.deny_privilege_escalation: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-deny-privilege-escalation]
module.ai_agent_debugging_role.aws_iam_role_policy.s3_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-s3-policy]
module.ai_agent_debugging_role.aws_iam_role_policy.cost_explorer_debugging: Refreshing state... [id=integrator-ai-agent-debugging-staging:integrator-ai-agent-debugging-staging-cost-explorer-policy]
module.ecs_api.aws_security_group.api: Refreshing state... [id=sg-06a8634c658cb6242]
module.ecs_api.aws_s3_bucket_policy.alb_logs[0]: Refreshing state... [id=integrator-staging-alb-logs]
module.doppler_aws_secrets.doppler_secrets_sync_aws_secrets_manager.sync: Refreshing state... [id=3cddcf0b-4014-410d-a552-88d3e5022617]
module.rds.aws_db_subnet_group.main: Refreshing state... [id=integrator-staging-db-subnet-group]
module.rds.data.aws_vpc.selected: Reading...
github_actions_secret.ecs_subnets: Refreshing state... [id=integrator:ECS_SUBNETS_STAGING]
module.ecs_api.aws_lb.api: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:loadbalancer/app/integrator-staging-alb/5e4f301bcd0263cb]
module.github_actions_role.aws_iam_role_policy_attachment.terraform_admin[0]: Refreshing state... [id=integrator-github-actions-staging-20260115154744404400000001]
module.github_actions_role.aws_iam_role_policy.lambda: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-lambda-policy]
module.github_actions_role.aws_iam_role_policy.cloudwatch_logs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-cloudwatch-logs-policy]
module.datadog_infra_alerts.datadog_monitor.alb_unhealthy_targets[0]: Refreshing state... [id=254278294]
module.doppler_aws_secrets.null_resource.wait_for_secrets: Refreshing state... [id=4154711961082380403]
github_actions_secret.aws_role_arn: Refreshing state... [id=integrator:AWS_ROLE_ARN_STAGING]
github_actions_secret.ecs_security_groups: Refreshing state... [id=integrator:ECS_SECURITY_GROUPS_STAGING]
module.ecs_api.aws_lb_listener.http: Refreshing state... [id=arn:aws:elasticloadbalancing:us-east-1:256586139593:listener/app/integrator-staging-alb/5e4f301bcd0263cb/d7549a73fdd1a11a]
module.datadog_infra_alerts.datadog_monitor.alb_5xx[0]: Refreshing state... [id=254279883]
module.rds.data.aws_vpc.selected: Read complete after 0s [id=vpc-0529dea5160deb846]
module.rds.aws_security_group.rds: Refreshing state... [id=sg-0c78e576a056ba44a]
module.rds.aws_db_instance.main: Refreshing state... [id=db-I57KURHD3UXFT56QM2OTWRW7TQ]
module.rds.null_resource.create_readonly_user[0]: Refreshing state... [id=3885368352693534014]
module.datadog_infra_alerts.datadog_monitor.rds_queue_depth: Refreshing state... [id=254278249]
module.datadog_infra_alerts.datadog_monitor.rds_cpu: Refreshing state... [id=254278250]
module.datadog_infra_alerts.datadog_monitor.rds_connections: Refreshing state... [id=254278252]
module.datadog_infra_alerts.datadog_monitor.rds_write_latency: Refreshing state... [id=254278253]
module.datadog_infra_alerts.datadog_monitor.rds_read_latency: Refreshing state... [id=254278251]
module.datadog_infra_alerts.datadog_monitor.rds_memory: Refreshing state... [id=254278254]
module.datadog_infra_alerts.datadog_monitor.rds_storage: Refreshing state... [id=254278255]
module.ecs_api.aws_iam_policy.api_secrets_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-secrets-access]
module.ecs_api.aws_iam_role_policy.migrations_lambda_access: Refreshing state... [id=integrator-staging-migrations-lambda-role:integrator-staging-migrations-lambda-access]
module.ecs_api.aws_lambda_function.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_api.aws_iam_role_policy_attachment.api_secrets_access: Refreshing state... [id=integrator-staging-api-task-execution-role-20251127102440078700000001]
module.ecs_api.aws_ecs_task_definition.migrations: Refreshing state... [id=integrator-staging-migrations]
module.ecs_background_runner.data.aws_caller_identity.current: Reading...
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Reading...
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Reading...
module.ecs_background_runner.aws_cloudwatch_log_group.task_logs: Refreshing state... [id=/ecs/integrator/staging/background-runner]
module.ecs_background_runner.data.aws_caller_identity.current: Read complete after 0s [id=256586139593]
module.ecs_background_runner.aws_iam_role.task: Refreshing state... [id=integrator-staging-ecs-task-role]
module.ecs_background_runner.aws_iam_policy.ecs_exec: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-exec-policy]
module.ecs_background_runner.aws_iam_policy.s3_knowledge_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-s3-knowledge-access]
module.ecs_background_runner.aws_iam_role.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role]
module.ecs_background_runner.aws_cloudwatch_log_group.ecs_task_events: Refreshing state... [id=/aws/events/integrator/staging/ecs-task-state-change]
module.ecs_background_runner.data.aws_ecs_cluster.default[0]: Read complete after 0s [id=arn:aws:ecs:us-east-1:256586139593:cluster/default]
module.ecs_background_runner.aws_iam_policy.bedrock_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-bedrock-access]
module.ecs_background_runner.aws_iam_policy.secrets_manager_access: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-ecs-secrets-access]
module.ecs_background_runner.aws_cloudwatch_event_rule.ecs_task_state_change: Refreshing state... [id=integrator-staging-ecs-task-state-change]
module.ecs_background_runner.aws_iam_role_policy_attachment.ecs_exec: Refreshing state... [id=integrator-staging-ecs-task-role-20251121173131775000000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.s3_knowledge_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260110152558095500000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.bedrock_access: Refreshing state... [id=integrator-staging-ecs-task-role-20260204130051819900000004]
module.ecs_background_runner.aws_iam_role_policy_attachment.task_execution: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173130862100000002]
module.ecs_background_runner.aws_iam_role_policy_attachment.secrets_access: Refreshing state... [id=integrator-staging-ecs-task-execution-role-20251121173131402400000003]
module.ecs_background_runner.aws_cloudwatch_event_target.ecs_task_state_change_logs: Refreshing state... [id=integrator-staging-ecs-task-state-change-LogTaskStateChanges]
module.ecs_background_runner.data.aws_ecr_repository.background_runner[0]: Read complete after 1s [id=integrator-background-runner]
module.ecs_background_runner.aws_ecs_task_definition.background_runner: Refreshing state... [id=integrator-staging-background-runner]
module.github_actions_role.aws_iam_role_policy.ecr: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecr-policy]
module.ecs_api.aws_iam_policy.launch_background_runner: Refreshing state... [id=arn:aws:iam::256586139593:policy/integrator-staging-api-launch-tasks-policy]
module.ecs_api.aws_ecs_task_definition.poller: Refreshing state... [id=integrator-staging-poller]
module.ecs_api.aws_ecs_task_definition.api: Refreshing state... [id=integrator-staging-api]
module.datadog_infra_alerts.datadog_monitor.background_runner_memory: Refreshing state... [id=254278257]
module.ecs_api.aws_ecs_service.poller: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller]
module.ecs_api.aws_ecs_service.api: Refreshing state... [id=arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api]
module.ecs_api.aws_iam_role_policy_attachment.api_launch_tasks: Refreshing state... [id=integrator-staging-api-task-role-20251121182311833700000002]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_memory: Refreshing state... [id=254278262]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_task_failures: Refreshing state... [id=254278261]
module.datadog_infra_alerts.datadog_monitor.ecs_poller_cpu: Refreshing state... [id=254278263]
module.github_actions_role.aws_iam_role_policy.ecs: Refreshing state... [id=integrator-github-actions-staging:integrator-github-actions-staging-ecs-policy]
module.datadog_infra_alerts.datadog_monitor.ecs_api_memory: Refreshing state... [id=254278265]
module.datadog_infra_alerts.datadog_monitor.ecs_api_task_failures: Refreshing state... [id=254278266]
module.datadog_infra_alerts.datadog_monitor.ecs_api_cpu: Refreshing state... [id=254278264]
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
- destroy
-/+ destroy and then create replacement
<= read (data resources)
Terraform will perform the following actions:
# module.datadog_infra_alerts.datadog_monitor.alb_502[0] will be created
+ resource "datadog_monitor" "alb_502" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 502 Bad Gateway Errors Detected
**502 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The backend is returning invalid responses or crashing.
**Action Required:**
- Check ECS task health - are containers crashing?
- Review container logs for application errors
- Check for OOM kills or segfaults
- Verify health check endpoint is working
**Common Causes:**
- Application crash during request handling
- Container OOM kill mid-request
- Misconfigured health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 502 Errors Recovered
Bad gateway errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 502 Bad Gateway"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_502{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:bad_gateway",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_504[0] will be created
+ resource "datadog_monitor" "alb_504" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] ALB 504 Gateway Timeouts Detected
**504 Count:** {{value}}
**Critical Threshold:** 0
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** Requests are being dropped because the API didn't respond in time.
**Action Required:**
- Check API service health - is it running?
- Check ECS task CPU/memory - is it overloaded?
- Check RDS connections/CPU - is the database a bottleneck?
- Review slow endpoints in APM traces
- Check for long-running synchronous operations
**Immediate Steps:**
1. Check ECS task count: Are tasks running?
2. Check container logs for errors
3. Check RDS metrics for connection exhaustion
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 504 Timeouts Recovered
Gateway timeout errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] ALB 504 Gateway Timeout"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_504{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:gateway_timeout",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:alb",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.alb_5xx[0] will be updated in-place
~ resource "datadog_monitor" "alb_5xx" {
id = "254279883"
~ message = <<-EOT
{{#is_alert}}
- :rotating_light: [STAGING] ALB 5xx Errors Critical
+ :rotating_light: [STAGING] ALB 5xx Errors Detected
**5xx Count:** {{value}}
- **Critical Threshold:** 50
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- **Action Required:** High 5xx errors indicate backend failures.
+ **Action Required:** 5xx errors indicate backend failures.
- Check ECS service health and task status
- Review application logs for errors
- Verify target group health checks
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
- {{#is_warning}}
- :warning: [STAGING] ALB 5xx Errors Warning
-
- **5xx Count:** {{value}}
- **Warning Threshold:** 10
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
- {{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] ALB 5xx Errors Recovered
- **5xx Count:** {{value}}
- **Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
-
- 5xx error rate returned to normal levels.
+ 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
~ name = "[STAGING] ALB 5xx Error Rate High" -> "[STAGING] ALB 5xx Errors"
~ query = "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 50" -> "sum(last_5m):sum:aws.applicationelb.httpcode_elb_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
tags = [
"Environment:staging",
"ManagedBy:Terraform",
"alert_type:infrastructure",
"environment:staging",
"resource:app/integrator-staging-alb/5e4f301bcd0263cb",
"service:alb",
"team:integrations",
]
# (16 unchanged attributes hidden)
~ monitor_thresholds {
~ critical = "50" -> "0"
- warning = "10" -> null
}
}
# module.datadog_infra_alerts.datadog_monitor.api_5xx_logs will be created
+ resource "datadog_monitor" "api_5xx_logs" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API 5xx Errors Detected in Application Logs
**Error Count:** {{value}} in last 30 minutes
**Service:** integrator-staging-api
**What This Means:** The API is returning 5xx errors (server errors).
**Log Attributes:**
- path: The endpoint that failed
- method: HTTP method (GET, POST, etc.)
- duration_ms: Response time
- client_ip: Caller IP
**Action Required:**
- Check application logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20%22status%3D5%22
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application Logs)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api \\\"status=5\\\"\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.api_target_5xx[0] will be created
+ resource "datadog_monitor" "api_target_5xx" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] API Application 5xx Errors Detected
**5xx Count:** {{value}}
**Load Balancer:** app/integrator-staging-alb/5e4f301bcd0263cb
**What This Means:** The API is returning 500 errors (unhandled exceptions).
**Action Required:**
- Check API service logs for stack traces
- Review recent deployments for breaking changes
- Check database connectivity
- Look for OOM or resource exhaustion
**View Logs:** https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20status%3Aerror
**View ALB Metrics:** https://app.datadoghq.com/dash/integration/aws_elb
{{/is_alert}}
{{#is_recovery}}
:white_check_mark: [STAGING] API 5xx Errors Recovered
Application 5xx errors cleared.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] API 5xx Errors (Application)"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "sum(last_5m):sum:aws.applicationelb.httpcode_target_5xx{loadbalancer:app/integrator-staging-alb/5e4f301bcd0263cb} > 0"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:application",
+ "environment:staging",
+ "resource:app/integrator-staging-alb/5e4f301bcd0263cb",
+ "service:api",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_execution_failed will be created
+ resource "datadog_monitor" "workflow_infra_execution_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Task Stopped Unexpectedly
**Alert:** A workflow's ECS task stopped unexpectedly (not OOM or timeout).
**What Happened:** The ECS task terminated before the workflow completed.
**Quick Links:**
- [View Execution Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_INFRA_FAILURE)
- [Container Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, exit_code, stopped_reason, container_reason, duration_hours, datadog_log_link
**Common Causes:**
- Container crash (segfault exit_code=139, unhandled exception)
- Task was stopped manually
- ECS capacity issues
- Health check failures
**Action Required:**
- Click the logs link to find exit_code and stopped_reason
- Review container logs for errors
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Execution Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_INFRA_FAILURE\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_execution",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_infra_initiation_failed will be created
+ resource "datadog_monitor" "workflow_infra_initiation_failed" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:x: [STAGING] Workflow Failed to Start
**Alert:** The API failed to initiate an ECS task for a workflow.
**What Happened:** The workflow was accepted but failed during ECS task creation.
**Quick Links:**
- [View Initiation Failure Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-api%20WORKFLOW_INFRA_INITIATION_FAILED)
- [ECS Service Events](https://console.aws.amazon.com/ecs/home?region=us-east-1#/clusters)
**Log Attributes Available:** workflow_id, app_name, workflow_name, error_type, error_message, failure_reason
**Common Causes:**
- Invalid workflow configuration
- ECS capacity issues (no available Fargate capacity)
- IAM permission issues
- Container image not found
- Secrets/environment variable issues
**Action Required:**
- Click the logs link to find the error_message with specific failure reason
- Review ECS service events in AWS Console
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Infra Initiation Failed"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-api WORKFLOW_INFRA_INITIATION_FAILED\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:infra_initiation",
+ "environment:staging",
+ "resource:api",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_memory_high will be created
+ resource "datadog_monitor" "workflow_memory_high" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:rotating_light: [STAGING] Workflow Memory Critical - OOM Risk
**Memory Utilization:** {{value}}%
**Critical Threshold:** 85%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Workflow approaching memory limit - OOM kill imminent.
- Check the specific workflow for memory-intensive operations
- Review agent execution patterns and data processing
- Consider terminating long-running workflows if memory continues rising
- May need to increase task memory allocation for this workflow type
**View Container Metrics:** https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner
{{/is_alert}}
{{#is_warning}}
:warning: [STAGING] Workflow Memory Warning
**Memory Utilization:** {{value}}%
**Warning Threshold:** 75%
**Task Family:** integrator-staging-background-runner
**Task ARN:** {{task_arn.name}}
**Action Required:** Monitor memory usage - may need intervention.
{{/is_warning}}
{{#is_recovery}}
:white_check_mark: [STAGING] Workflow Memory Recovered
**Memory Utilization:** {{value}}%
**Task Family:** integrator-staging-background-runner
Memory utilization returned to safe levels.
{{/is_recovery}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Memory High"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "avg(last_5m):( avg:container.memory.usage{task_family:integrator-staging-background-runner} by {task_arn} / avg:container.memory.limit{task_family:integrator-staging-background-runner} by {task_arn} ) * 100 > 85"
+ renotify_interval = 0
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:memory",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "metric alert"
+ monitor_thresholds {
+ critical = "85"
+ warning = "75"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_oom_kill will be created
+ resource "datadog_monitor" "workflow_oom_kill" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:skull: [STAGING] Workflow OOM Kill Detected
**Alert:** A workflow container was killed due to Out of Memory (exit code 137)
**What Happened:** The container exceeded its memory limit and was terminated by the kernel (SIGKILL).
**Quick Links:**
- [View OOM Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_OOM_KILL)
- [Container Memory Metrics](https://app.datadoghq.com/containers?query=task_family%3Aintegrator-staging-background-runner)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, exit_code, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Review memory usage patterns in the killed container
- Check for memory leaks in agent code or data processing
- Consider increasing task memory allocation if this recurs
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow OOM Kill Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_OOM_KILL\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:oom",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.datadog_infra_alerts.datadog_monitor.workflow_timeout will be created
+ resource "datadog_monitor" "workflow_timeout" {
+ draft_status = "published"
+ evaluation_delay = (known after apply)
+ id = (known after apply)
+ include_tags = true
+ message = <<-EOT
{{#is_alert}}
:hourglass: [STAGING] Workflow Timeout Detected
**Alert:** A workflow exceeded its timeout threshold and was terminated.
**What Happened:** The workflow ran longer than its configured timeout and was stopped by the poller.
**Quick Links:**
- [View Timeout Logs](https://app.datadoghq.com/logs?query=service%3Aintegrator-staging-*%20WORKFLOW_TIMEOUT)
**Log Attributes Available:** workflow_id, app_name, tool_name, duration_hours, timeout_hours, datadog_log_link
**Action Required:**
- Click the logs link above to find the specific workflow
- Check if the workflow is stuck or legitimately needs more time
- Consider increasing timeout_hours if this workflow type needs more time
- Review for infinite loops or inefficient processing
{{/is_alert}}
@slack-integrator-staging-alerts
EOT
+ name = "[STAGING] Workflow Timeout Detected"
+ new_host_delay = 300
+ notify_audit = false
+ notify_no_data = false
+ priority = "3"
+ query = "logs(\"service:integrator-staging-* WORKFLOW_TIMEOUT\").index(\"*\").rollup(\"count\").last(\"30m\") > 0"
+ renotify_interval = 60
+ require_full_window = false
+ tags = [
+ "Environment:staging",
+ "ManagedBy:Terraform",
+ "alert_type:timeout",
+ "environment:staging",
+ "resource:integrator-staging-background-runner",
+ "service:workflow",
+ "team:integrations",
]
+ timeout_h = 0
+ type = "log alert"
+ monitor_thresholds {
+ critical = "0"
}
}
# module.dev_launch_template.aws_security_group.dev_instances will be destroyed
# (because aws_security_group.dev_instances is not in configuration)
- resource "aws_security_group" "dev_instances" {
- arn = "arn:aws:ec2:us-east-1:256586139593:security-group/sg-0bf64ade7c98a1da6" -> null
- description = "Security group for personal dev EC2 instances" -> null
- egress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "Allow all outbound traffic"
- from_port = 0
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "-1"
- security_groups = []
- self = false
- to_port = 0
},
] -> null
- id = "sg-0bf64ade7c98a1da6" -> null
- ingress = [
- {
- cidr_blocks = [
- "0.0.0.0/0",
]
- description = "SSH from anywhere (restricted by individual instance tags)"
- from_port = 22
- ipv6_cidr_blocks = []
- prefix_list_ids = []
- protocol = "tcp"
- security_groups = []
- self = false
- to_port = 22
},
] -> null
- name = "integrator-dev-instances" -> null
- owner_id = "256586139593" -> null
- revoke_rules_on_delete = false -> null
- tags = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- tags_all = {
- "Environment" = "dev"
- "ManagedBy" = "terraform"
- "Name" = "integrator-dev-instances-sg"
} -> null
- vpc_id = "vpc-0529dea5160deb846" -> null
}
# module.doppler_aws_secrets.data.aws_secretsmanager_secrets.synced will be read during apply
# (depends on a resource or a module with changes pending)
<= data "aws_secretsmanager_secrets" "synced" {
+ arns = (known after apply)
+ id = (known after apply)
+ names = (known after apply)
+ filter {
+ name = "name"
+ values = [
+ "doppler/integrator/staging/",
]
}
}
# module.doppler_aws_secrets.doppler_secret.secrets["GH_WEBHOOK_SECRET"] will be destroyed
# (because key ["GH_WEBHOOK_SECRET"] is not in for_each map)
- resource "doppler_secret" "secrets" {
- computed = (sensitive value) -> null
- config = "staging" -> null
- id = "integrator.staging.GH_WEBHOOK_SECRET" -> null
- name = "GH_WEBHOOK_SECRET" -> null
- project = "integrator" -> null
- value = (sensitive value) -> null
- value_type = "string" -> null
- visibility = "masked" -> null
}
# module.doppler_aws_secrets.null_resource.wait_for_secrets must be replaced
-/+ resource "null_resource" "wait_for_secrets" {
~ id = "4154711961082380403" -> (known after apply)
~ triggers = { # forces replacement
~ "secret_count" = "57" -> "56"
# (2 unchanged elements hidden)
}
}
# module.ecs_api.aws_ecs_service.api will be updated in-place
~ resource "aws_ecs_service" "api" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-api"
name = "integrator-staging-api"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:150" -> (known after apply)
# (16 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_service.poller will be updated in-place
~ resource "aws_ecs_service" "poller" {
id = "arn:aws:ecs:us-east-1:256586139593:service/default/integrator-staging-poller"
name = "integrator-staging-poller"
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-poller-service"
}
~ task_definition = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:136" -> (known after apply)
# (16 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
# module.ecs_api.aws_ecs_task_definition.api must be replaced
-/+ resource "aws_ecs_task_definition" "api" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api:150" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-api" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-api" -> (known after apply)
~ revision = 150 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-staging-api"
}
# (10 unchanged attributes hidden)
}
# module.ecs_api.aws_ecs_task_definition.poller must be replaced
-/+ resource "aws_ecs_task_definition" "poller" {
~ arn = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller:136" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:256586139593:task-definition/integrator-staging-poller" -> (known after apply)
~ container_definitions = (sensitive value) # forces replacement
~ enable_fault_injection = false -> (known after apply)
~ id = "integrator-staging-poller" -> (known after apply)
~ revision = 136 -> (known after apply)
tags = {
"Environment" = "staging"
"ManagedBy" = "Terraform"
"Name" = "integrator-s
... (output truncated)
🔵 Comment /apply-staging to apply these changes.