Semgrep is a fantastic tool, as the website rightly states!
Static analysis at ludicrous speed Find bugs and enforce code standards
We use it in our open-source community where standards are not just for code but also documentation which we wrap up in a style or contribution guide.
The Style Guide
As open-source communities evolve and grow with many contributors, standards usually need to be defined so that the maintainers are not constantly posting their comments on pull requests for updates to style-related issues.
Within our Terraform community, we have tackled this by defining a style guide that covers naming, syntax, variables, versioning, etc.
Over time this guide has become pretty comprehensive, but we don't want our contributors to have to keep re-reading it; we want to provide a quick feedback loop on issues via the pull request. Maintainers only need to get involved for the final review providing the best experience for any contributor.
Enter semgrep
NOTE:- As my organization is using a Github Enterprise instance, I'm unable to use the Semgrep CI option here
Semgrep Rules
Semgrep enables us to collect rules together in files. We can have one big one or many small ones. The structure is really up to us.
Create and maintain central rule files
The CLI options allow us to pass in rule files using the following. More details here.
Configuration options: [mutually_exclusive]
-c, -f, --config TEXT
YAML configuration file, directory of YAML
files ending in .yml|.yaml, URL of a
configuration file, or Semgrep registry
entry name.
Use --config auto to automatically obtain
rules tailored to this project; your project
URL will be used to log in to the Semgrep
registry.
To run multiple rule files simultaneously,
use --config before every YAML, URL, or
Semgrep registry entry name. For example
`semgrep --config p/python --config
myrules/myrule.yaml`
See https://semgrep.dev/docs/writing-
rules/rule-syntax for information on
configuration file format.
Within our community, we host a repository for all our rules files with the following structure.
semgrep
semgrep/aws.yaml
semgrep/azure.yaml
semgrep/base.yaml
As you can imagine, one for AWS rules, one for Azure, and standard stuff in base.
Using this structure, we can check out the repo as part of our GitHub Actions workflow and execute the following command to process all the rule files.
semgrep-agent --config auto --config .semgrep-rules/semgrep/
AWS Rules
Let's look at one of the AWS rules and walk through what they will trigger.
rules:
- id: community.terraform.aws.iam.role.description
patterns:
- pattern: |
resource "aws_iam_role" "..." {...}
- pattern-not-inside: |
resource "aws_iam_role" "..." {description=...}
languages:
- hcl
severity: WARNING
message: All aws_iam_roles should have a description
metadata:
category: style guide
technology:
- terraform
- aws
We give the rule a unique id displayed in any output.
id: community.terraform.aws.iam.role.description
Then we match on the aws_iam_role terraform resource.
resource "aws_iam_role" "..." {...}
Then our match ensures a description has a value.
resource "aws_iam_role" "..." {description=...}
We set the expected language, define a severity, a friendly message, and some metadata to wrap things up nicely.
Azure Rules
Here is an Azure example rule.
rules:
- id: community.terraform.azure.provider.azurerm.features
patterns:
- pattern-inside: |
provider "azurerm" {... features {...} ...}
languages:
- hcl
paths:
include:
- versions.tf
severity: ERROR
message: azurerm features block should not be used within modules
metadata:
category: style guide
technology:
- terraform
- azure
Same deal as AWS, but here we are checking for the presence of a features flag within the provider block.
Let's move on to something more documentation related that we add to base.
Base Rules
The base rules file is where we add things that should be present in the documentation or that are common regardless of the public cloud provider.
Variable Description Length
This one uses a little regex to check descriptions for variables that are all 20+ characters.
rules:
- id: community.terraform.missing.variable.description.length
patterns:
- pattern: |
variable "..." {... description="$DESC" ...}
- metavariable-regex:
metavariable: $DESC
regex: (^.{0,20}$)
languages:
- hcl
severity: ERROR
message: All variables should have a description value of greater than 20 characters
metadata:
category: style guide
technology:
- terraform
Pinning Terraform Modules
This one uses regex to check Terraform Modules are versioned.
rules:
- id: community.terraform.module.pinned.version
patterns:
- pattern: module "..." {... source="..." ...}
- pattern-not-regex: source\s+\=\s\"\S+ref=v\S+\"
languages:
- hcl
severity: WARNING
message: Modules should be pinned to a specific version using the version tag
metadata:
category: style guide
technology:
- terraform
Mandatory Headings
I can't take credit for this one as the fantastic semgrep team provided it over at semgrep slack community. It ensures that contributors aren't overly keen on adding too many headings.
rules:
- id: community.terraform.module.mandatory.headings
patterns:
- pattern-regex: |
(?<=^|\n)## *(?!Usage\n|Inputs Notes\n|Requirements\n|Providers\n|Modules\n|Resources\n|Inputs\n|Outputs\n|Examples\n|Reference\n)[^ ][^\n]*
paths:
include:
- README.md
languages:
- regex
message: The following heading should be present in all modules Usage, Inputs Notes, Requirements, Providers, Modules, Resources, Inputs, Outputs, Examples, Reference
severity: WARNING
metadata:
category: styleguide
technology:
- terraform
GitHub Action
To finish the magic, we add the following GitHub Action workflow to all our Terraform modules to assist our contributors to follow the style of the community.
---
name: Super Linter
# yamllint disable-line rule:truthy
on:
pull_request:
branches: [master, main]
jobs:
semgrep:
name: Semgrep Terraform Community Style Guide
runs-on: [self-hosted]
container:
image: 123456789101.dkr.ecr.ap-southeast-2.amazonaws.com/returntocorp/semgrep-agent:v1
# Skip any PR created by dependabot to avoid permission issues
if: (github.actor != 'dependabot[bot]')
steps:
- name: Checkout Code
uses: actions/checkout@v2
- name: Checkout Semgrep Rules
uses: actions/checkout@v2
with:
repository: 'community-terraform/roadmap'
path: .semgrep-rules
token: ${{ secrets.PUBLIC_REPO_RO }}
- name: semgrep-agent
run: semgrep-agent --config auto --config .semgrep-rules/semgrep/
The observant ones will see that we are using an AWS Elastic Container Repo for the semgrep-agent container. Check out this guide on how to sync Docker Hub containers to ECS Repos.
We also have to pass in a PAT token so the Action can access the roadmap repo that contains our semgrep rules.
Hope this helps someone else
Cheers!