Supercharge Your AWS CloudWatch Metrics with Lambda Powertools

Supercharge Your AWS CloudWatch Metrics with Lambda Powertools

In this post, I'll show you how easy it is to publish custom metrics into AWS CloudWatch using AWS Lambda Powertools and the Cloudwatch EMU Specification

But first, the backstory

Backstory

I've been studying for the AWS Network Specialty exam so my head is full of Direct Connect Hosted or Dedicated Connections, Transit VIFs and when to use VPN over Direct Connect.

One of the things that usually gets tested is knowing the standard CloudWatch metrics AWS provides for the various components above. As we are using Direct Connect at work I can have a poke around there and familiarise myself with the various metrics available.

Whilst doing this, I noticed there are no BGP-specific metrics available via CloudWatch, however within the UI we do get the following for each of our Virtual Interfaces.

VIF BGP Status

So how can we get these? We need to use the AWS Cli describe-virtual-interfaces command.

Running aws directconnect describe-virtual-interfaces against our target account will display all our VIFs BGP Status.

Within there we will have a list object of our bgpPeers

"bgpPeers": [
    {
        "bgpPeerId": "dxpeer-ffdfqwqdh",
        "asn": 111111,
        "authKey": "**********",
        "addressFamily": "ipv4",
        "amazonAddress": "1.2.3.4/30",
        "customerAddress": "1.2.3.4/30",
        "bgpPeerState": "available",    <-------
        "bgpStatus": "up",              <-------
        "awsDeviceV2": "*****-********",
        "awsLogicalDeviceId": "*****-********"
    }
]

That's great, but I want to monitor this via CloudWatch, and maybe pump it into a remote monitoring solution like Splunk.

How can I get the metrics available to me without running the cli.

Easy, this is AWS, so lets build it

The Lambda

First, we need to recreate our AWS Cli call within a Lambda. This is very easy since it is a single call and will only require the directconnect:DescribeVirtualInterfaces IAM permission.

The code performs the following:-

  • Create some lookup dicts allowing us to map our status for our metrics.
  • Open up a boto3 client interface for us to use
  • Grabs all the virtual interfaces within the target account
  • Loops through the output and extracts the values we are interested in
import boto3, os

# bgpPeerState mapping
bgp_peer_state_lookup = {
    'verifying': 0,  # The BGP peering addresses or ASN require validation before the BGP peer can be created. This state applies only to public virtual interfaces.
    'pending': 2,  # The BGP peer is created, and remains in this state until it is ready to be established.
    'available': 1,  # The BGP peer is ready to be established.
    'deleting': 3,  # The BGP peer is being deleted.
    'deleted': 4  # The BGP peer is deleted and cannot be established.
}

bgp_status_lookup = {
    'down': 0,  # The BGP peer is down.
    'unknown': 2,  # The BGP peer status is not available.
    'up': 1  # The BGP peer is established. This state does not indicate the state of the routing function. Ensure that you are receiving routes over the BGP session.
}

def handler(event, context):
    client = boto3.client('directconnect')
    response = client.describe_virtual_interfaces()

    virtual_interfaces = response.get('virtualInterfaces', [])

    results = []
    for virtual_interface in virtual_interfaces:
        owner_account = virtual_interface.get('ownerAccount', None)
        virtual_interface_id = virtual_interface.get('virtualInterfaceId', None)
        location = virtual_interface.get('location', None)
        virtual_interface_type = virtual_interface.get('virtualInterfaceType', None)
        virtual_interface_name = virtual_interface.get('virtualInterfaceName', None)

        bgp_peers = virtual_interface.get('bgpPeers', [])
        if bgp_peers:
            bgp_peer = bgp_peers[0]
            bgp_peer_state = bgp_peer.get('bgpPeerState', None)
            bgp_status = bgp_peer.get('bgpStatus', None)

            # Convert bgp_peer_state, bgp_status_lookup to numeric value
            numeric_bgp_peer_state = bgp_peer_state_lookup.get(bgp_peer_state, None)
            numeric_bgp_status = bgp_status_lookup.get(bgp_status, None)
        else:
            bgp_peer_state = None
            bgp_status = None
            numeric_bgp_peer_state = None

Lambda Powertools

Lambda Powertools is a must for anyone wanting to get the most out of Lambda with very little effort.

I won't go into too much detail as the online doco, hosted here is very comprehensive and has loads of examples.

We add this to our Lambda via a Lambda Layer which can be done via CloudFormation.

Parameters:
  LambdaPowerLayerArn:
    Type: String
    Description: Retrieve Powertools for AWS Lambda (Python) https://docs.powertools.aws.dev/lambda/python/latest/ `
    Default: 'arn:aws:lambda:ap-southeast-2:017000801446:layer:AWSLambdaPowertoolsPythonV2:40'
Resources:
  BGPVirtualInterfaceStatus:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Layers:
        - !Ref LambdaPowerLayerArn

Apart from the standard logging capabilities for this purpose, we are mainly interested in Amazon CloudWatch Embedded Metric Format (EMF)

Using this functionality we are easily able to post our metrics to CloudWatch with a few lines of code.

with single_metric(name="BGPPeerState", unit=MetricUnit.Count, value=numeric_bgp_peer_state) as metric:
    metric.add_dimension(name="VirtualInterfaceID", value=virtual_interface_id)
    metric.add_dimension(name="Location", value=location)
    metric.add_dimension(name="AccountOwner", value=owner_account)
    metric.add_dimension(name="Type", value=virtual_interface_type)
    metric.add_dimension(name="Name", value=virtual_interface_name)

with single_metric(name="BGPStatus", unit=MetricUnit.Count, value=numeric_bgp_status) as metric:
    metric.add_dimension(name="VirtualInterfaceID", value=virtual_interface_id)
    metric.add_dimension(name="Location", value=location)
    metric.add_dimension(name="AccountOwner", value=owner_account)
    metric.add_dimension(name="Type", value=virtual_interface_type)
    metric.add_dimension(name="Name", value=virtual_interface_name)

Scheduling

We now use AWS EventBridge Rule to cron our lambda to run every minute.

  Schedule:
    DependsOn: LambdaLogGroup
    Type: AWS::Events::Rule
    Properties:
      ScheduleExpression: rate(1 minute)
      State: ENABLED
      Targets:
        - Arn: !GetAtt BGPVirtualInterfaceStatus.Arn
          Id: "TargetFunctionV1"

  PermissionToInvokeLambda:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref BGPVirtualInterfaceStatus
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !Sub ${Schedule.Arn}

Putting it all together

With an IAM Role and some parameters we now have a fully contained CloudFormation template that will generate all our metrics.

The full CloudFormation is available at https://github.com/sjramblings/cloudwatch-dx-metrics/blob/main/DXMetrics.yaml

We can use these to troubleshoot or alert when someone is being naughty with our BGP config.

CloudWatch Dimensions

CloudWatch Metrics

CloudWatch Graph

Summary

As you can see above it is easy with Lambda Powertools to import custom metrics with dimensions into CloudWatch. The only thing that is stopping you is your imagination!

The full code for the above is available on GitHub here

Hope this helps someone else!

Cheers

Did you find this article valuable?

Support Stephen Jones by becoming a sponsor. Any amount is appreciated!