In this post, I'll show you how easy it is to publish custom metrics into AWS CloudWatch using AWS Lambda Powertools and the Cloudwatch EMU Specification
But first, the backstory
Backstory
I've been studying for the AWS Network Specialty exam so my head is full of Direct Connect Hosted or Dedicated Connections, Transit VIFs and when to use VPN over Direct Connect.
One of the things that usually gets tested is knowing the standard CloudWatch metrics AWS provides for the various components above. As we are using Direct Connect at work I can have a poke around there and familiarise myself with the various metrics available.
Whilst doing this, I noticed there are no BGP-specific metrics available via CloudWatch, however within the UI we do get the following for each of our Virtual Interfaces.
So how can we get these? We need to use the AWS Cli describe-virtual-interfaces command.
Running aws directconnect describe-virtual-interfaces
against our target account will display all our VIFs BGP Status.
Within there we will have a list object of our bgpPeers
"bgpPeers": [
{
"bgpPeerId": "dxpeer-ffdfqwqdh",
"asn": 111111,
"authKey": "**********",
"addressFamily": "ipv4",
"amazonAddress": "1.2.3.4/30",
"customerAddress": "1.2.3.4/30",
"bgpPeerState": "available", <-------
"bgpStatus": "up", <-------
"awsDeviceV2": "*****-********",
"awsLogicalDeviceId": "*****-********"
}
]
That's great, but I want to monitor this via CloudWatch, and maybe pump it into a remote monitoring solution like Splunk.
How can I get the metrics available to me without running the cli.
Easy, this is AWS, so lets build it
The Lambda
First, we need to recreate our AWS Cli call within a Lambda. This is very easy since it is a single call and will only require the directconnect:DescribeVirtualInterfaces
IAM permission.
The code performs the following:-
- Create some lookup dicts allowing us to map our status for our metrics.
- Open up a boto3 client interface for us to use
- Grabs all the virtual interfaces within the target account
- Loops through the output and extracts the values we are interested in
import boto3, os
# bgpPeerState mapping
bgp_peer_state_lookup = {
'verifying': 0, # The BGP peering addresses or ASN require validation before the BGP peer can be created. This state applies only to public virtual interfaces.
'pending': 2, # The BGP peer is created, and remains in this state until it is ready to be established.
'available': 1, # The BGP peer is ready to be established.
'deleting': 3, # The BGP peer is being deleted.
'deleted': 4 # The BGP peer is deleted and cannot be established.
}
bgp_status_lookup = {
'down': 0, # The BGP peer is down.
'unknown': 2, # The BGP peer status is not available.
'up': 1 # The BGP peer is established. This state does not indicate the state of the routing function. Ensure that you are receiving routes over the BGP session.
}
def handler(event, context):
client = boto3.client('directconnect')
response = client.describe_virtual_interfaces()
virtual_interfaces = response.get('virtualInterfaces', [])
results = []
for virtual_interface in virtual_interfaces:
owner_account = virtual_interface.get('ownerAccount', None)
virtual_interface_id = virtual_interface.get('virtualInterfaceId', None)
location = virtual_interface.get('location', None)
virtual_interface_type = virtual_interface.get('virtualInterfaceType', None)
virtual_interface_name = virtual_interface.get('virtualInterfaceName', None)
bgp_peers = virtual_interface.get('bgpPeers', [])
if bgp_peers:
bgp_peer = bgp_peers[0]
bgp_peer_state = bgp_peer.get('bgpPeerState', None)
bgp_status = bgp_peer.get('bgpStatus', None)
# Convert bgp_peer_state, bgp_status_lookup to numeric value
numeric_bgp_peer_state = bgp_peer_state_lookup.get(bgp_peer_state, None)
numeric_bgp_status = bgp_status_lookup.get(bgp_status, None)
else:
bgp_peer_state = None
bgp_status = None
numeric_bgp_peer_state = None
Lambda Powertools
Lambda Powertools is a must for anyone wanting to get the most out of Lambda with very little effort.
I won't go into too much detail as the online doco, hosted here is very comprehensive and has loads of examples.
We add this to our Lambda via a Lambda Layer which can be done via CloudFormation.
Parameters:
LambdaPowerLayerArn:
Type: String
Description: Retrieve Powertools for AWS Lambda (Python) https://docs.powertools.aws.dev/lambda/python/latest/ `
Default: 'arn:aws:lambda:ap-southeast-2:017000801446:layer:AWSLambdaPowertoolsPythonV2:40'
Resources:
BGPVirtualInterfaceStatus:
Type: AWS::Lambda::Function
Properties:
Handler: index.handler
Layers:
- !Ref LambdaPowerLayerArn
Apart from the standard logging capabilities for this purpose, we are mainly interested in Amazon CloudWatch Embedded Metric Format (EMF)
Using this functionality we are easily able to post our metrics to CloudWatch with a few lines of code.
with single_metric(name="BGPPeerState", unit=MetricUnit.Count, value=numeric_bgp_peer_state) as metric:
metric.add_dimension(name="VirtualInterfaceID", value=virtual_interface_id)
metric.add_dimension(name="Location", value=location)
metric.add_dimension(name="AccountOwner", value=owner_account)
metric.add_dimension(name="Type", value=virtual_interface_type)
metric.add_dimension(name="Name", value=virtual_interface_name)
with single_metric(name="BGPStatus", unit=MetricUnit.Count, value=numeric_bgp_status) as metric:
metric.add_dimension(name="VirtualInterfaceID", value=virtual_interface_id)
metric.add_dimension(name="Location", value=location)
metric.add_dimension(name="AccountOwner", value=owner_account)
metric.add_dimension(name="Type", value=virtual_interface_type)
metric.add_dimension(name="Name", value=virtual_interface_name)
Scheduling
We now use AWS EventBridge Rule to cron our lambda to run every minute.
Schedule:
DependsOn: LambdaLogGroup
Type: AWS::Events::Rule
Properties:
ScheduleExpression: rate(1 minute)
State: ENABLED
Targets:
- Arn: !GetAtt BGPVirtualInterfaceStatus.Arn
Id: "TargetFunctionV1"
PermissionToInvokeLambda:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref BGPVirtualInterfaceStatus
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !Sub ${Schedule.Arn}
Putting it all together
With an IAM Role and some parameters we now have a fully contained CloudFormation template that will generate all our metrics.
The full CloudFormation is available at https://github.com/sjramblings/cloudwatch-dx-metrics/blob/main/DXMetrics.yaml
We can use these to troubleshoot or alert when someone is being naughty with our BGP config.
Summary
As you can see above it is easy with Lambda Powertools to import custom metrics with dimensions into CloudWatch. The only thing that is stopping you is your imagination!
The full code for the above is available on GitHub here
Hope this helps someone else!
Cheers