Cloud Monitoring
Cloud Monitoring
Amazon CloudWatch Metrics
- CloudWatch provides metrics for every services in AWS
- Metric is a variable to monitor ( CPUUtilization, Networking...)
- Metrics have timestamps
- Can create CloudWatch dashboards of metrics
Important Metrics
- EC2 instances: CPU Utilization, Status Checks, Network (not RAM)
- Default metrics every 5 minutes
- Option for Detailed Monitoring ($$$): metrics every 1 minute
- EBS volumes: Disk Read/Writes
- S3 buckets: BucketSizeBytes, NumberOfObjects, AllRequests
- Billing: Total Estimated Charge (only in us-east-1)
- Service Limits: how many you've been using a service API
- Custom mertics: push your own metrics
Amazon CloudWatch Alarms
- Alarms are used to trigger notifications for any metric
- Alarms actions
- Auto Scaling: increase or decrease EC2 instances "desired" count
- EC2 Actions: stop, terminate, reboot or recover an EC2 instance
- SNS notifications: send a notification into an SNS topic
- Various options (sampling, %, max, min, etc...)
- Can choose the period on which to evaluate an alarm
- Example: create a billing alarm on the CloudWatch Billing metric
- Alarm States: OK, INSUFFICIENT_DATA, ALARM
Amazon CloudWatch Logs
- CloudWatch Logs can collect log from:
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- CloudTrail based on filter
- CloudWatch log agents: on EC2 machines or on-premises servers
- Route53: Log DNS queries
- Enables real-time monitoring of logs
- Adjustable CloudWatch Logs retention
CloudWatch Logs for EC2
- By default, no logs from your EC2 instance will go to CloudWatch
- You need to run a CloudWatch agent on EC2 to push the log files you want
- Make sure IAM permissions are correct
- The CloudWatch log agent can be setup on-premises too
Amazon EventBridge (formerly CloudWatch Events)
- Schedule: Cron jobs (Scheduled scripts)
- Event Pattern: Event rules to react to a service doing something
- Trigger Lambda functions, send SQS/SNS messages...
- Schema Registry: model event schema
- You can archive events (all/filter) sent to an event bus (indefinitely or set period)
- Ability to replay archived events
AWS CloudTrail
- Provides governance, compliance and audit for your AWS Account
- CloudTrail is enabled by default
- Get an history of events/API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
- Can put logs from CloudTrail into CloudWatch Logs or S3
- A trail can be applied to All Regions (default) or a single Region.
- If a resource is deleted in AWS, investigate CloudTrail first
AWS X-Ray
- Debugging in Production, the good old way:
- Test locally
- Add log statements everywhere
- Re-deploy in production
- Log formats differ across applications and log analysis is hard.
- Debugging: one big monolith "easy", distributed services "hard"
- No common views of your entire architecture
- Solution is X-Ray
AWS X-Ray advantages
- Troubleshooting performance (bottlenecks)
- Understand dependencies in a microservice architecture
- Pinpoint service issues
- Review request behaviour
- Find errors and exceptions
- Are we meeting time SLA?
- Where I am throttled?
- Identify users that are impacted
Amazon CodeGuru (decomissioned)
- An ML-powered service for automated code reviews and application performance recommendations
- Provides two functionalities
- CodeGuru Reviewer: automated code reviews for static code analysis (development)
- CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)
Amazon CodeGuru Reviewer
- Identify critical issues, security vulnerabilities, and hard to find bugs
- Example: common coding best practices, resource leaks, security detection, input validation
- Uses Machine Learning and automated reasoning
- Hard-learned lessons across millions of code reviews on 1000s of open-source and Amazon repositories
- Supports Java and Python
- Integrates with GitHub, Bitbucket and AWS CodeCommit
Amazon CodeGuru Profiler
- Helps understand the runtime behaviour of your application
- Example: identify if your application is consuming excessive CPU capacity on a logging routine
- Features:
- Identify and remove code inefficiencies
- Improve application performance (e.g., reduce CPU utilization)
- Decrease compute costs
- Provides heap summary (identify which objects using up memory)
- Anomaly Detection
- Support applications running on AWS or on-premise
- Minimal overhead on application
AWS Service Health Dashboard - Service History
- Shows all regions, all services health
- Shows historical information for each day
- Has an RSS feed you can subscribe to
- Previously called AWS Service Health Dashboard
AWS Account Heatlh Dashboard - Your Account
- Previously called AWS Personal Health Dashboard (PHD)
- AWS Account Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you
- While the Service Health Dashboard displays the general status of AWS services, Account Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources
- The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.
- Can aggregate data from an entire AWS Organization
- Global Service
- Show how AWS outages directly impact you and your AWS resources
- Alert, remediation, proactive, scheduled activities
Monitoring Summary
- CloudWatch
- Metrics: monitor the performance of AWS services and billing metrics
- Alarms: automate notification, perform EC2 action, notify to SNS based on metric
- Logs: collect log files from EC2 instances, servers, Lambda functions...
- Events ( or EventBridge): react to events in AWS, or trigger rule on a schedule
- CloudTrail: audit API calls made within your AWS Account
- CloudTrail insights: automated analysis of your CloudTrail Events
- X-Ray: trace requests made through your distributed applications
- AWS Health Dashboard: status of all AWS services across all regions
- AWS Account Health Dashboard: AWS events that impact your infrastructure
- Amazon CodeGuru: automated code reviews and application performance recommandations









Comments
Post a Comment