GitHub Actions is a CI/CD (Continuous Integration/Continuous Deployment) platform integrated into GitHub, allowing users to automate software development workflows, such as building, testing, and deploying code.
Actions are triggered by various events on GitHub, such as pushes to a repository or creation of pull requests, and run on virtual machines provided by GitHub or on Self-Hosted runners.
When running a GitHub Enterprise Instance, you must run your runners if you want to use GitHub Actions.
This post is about the beautiful open-source project we have tested to host our runners.
Philips-Labs / terraform-aws-github-runner
Hosted on GitHub, this project delivers ephemeral AWS SPOT runner instances scaled through a call to API Gateway backed by Lambdas.
Some of the features include:
- GitHub App Support
- Enterprise, Org or Repo-Level Webhooks
- Runner Binary Updates
- Multiple Runner Types / Operating Systems
- Runner Pools & Scheduling
- Re-Use / Ephemeral
- Default / Custom AMI
- Custom UserData
All of the above make it a desirable option to support Self-Hosted Runners.
There are some great examples in the project repository to get you started; however, as we use API Gateway Endpoints within our VPCs, it was necessary to modify things a little to use a Private API Gateway Endpoint.
NOTE:- Once we migrate our instance Enterprise Managed User, we can use the project without modification.
Here is the configuration we run today, giving us scalable ephemeral runners.
Let's walk through it
First up, our GitHub App will send workflow_job events to our Private API Gateway, which will initiate a scale-up event (creation of SPOT instance) via the Scale Runners Up Lambda.
As you can see, we run these SPOT instances in a separate (disconnected from our internal network) VPC that only has outbound internet access via a NAT Gateway.
Once the SPOT instance comes online, it is registered for use within the GitHub Organisation and will process the job and terminate as we use ephemeral mode.
If for some reason, the job doesn't kill off the runner, it will be picked up every minute by the Scale Runners Down Lamda.
NOTE:- There are all sorts of configuration options available such as pooling and idle runners; check out the project README!
We are using a custom image that is generated on a daily schedule and built using HashiCorp Packer which is displayed in the bottom left of the diagram. Custom Image
We use a custom image generated daily and built using Packer, displayed in the bottom left of the diagram.
The need for speed
We use a custom image to ensure we have everything in our operating environment for our workflows to succeed. Most of this is container caches, as we experienced docker hub rate limiting early on when kicking off many workflows concurrently.
From workflow events to the job running, we consistently hit about the one-minute mark. This could be further improved with pooling; however, I have to suck up the cost for idling instances.
This post showed the excellent work that has gone into the https://github.com/philips-labs/terraform-aws-github-runner project.
Please go check it out and give them a star!
The following post will show it being used in anger spinning up about 300 instances when we had to roll out a fix for the GitHub Actions: Deprecating save-state and set-output commands post.
I hope this helps someone else!