Using Semgrep to keep track of your communities style

Semgrep

Semgrep is a fantastic tool, as the website rightly states!

Static analysis at ludicrous speed
Find bugs and enforce code standards

We use it in our open source community where standards are not just for code but also documentation which we wrap up in a style or contribution guide.

The Style Guide

As open source communities evolve and grow with many contributors, standards usually need to be defined so that the maintainers are not constantly posting their comments on pull requests for updates to style-related issues.

Within our Terraform community, we have tackled this by defining a style guide that covers naming, syntax, variables, versioning, etc.

Over time this guide has become pretty comprehensive, but we don't want our contributors to have to keep re-reading it; we want to provide a quick feedback loop on issues via the pull request. Maintainers only need to get involved for the final review providing the best experience for any contributor.

Enter semgrep

NOTE:- As my organization is using a Github Enterprise instance, I'm unable to use the Semgrep CI option here

Semgrep Rules

Semgrep enables us to collect rules together in files. We can have one big one or many small ones. The structure is really up to us.

Create and maintain central rule files

The CLI options allow us to pass in rule files using the following. More details here.

  Configuration options: [mutually_exclusive]
    -c, -f, --config TEXT         
    
    YAML configuration file, directory of YAML
    files ending in .yml|.yaml, URL of a
    configuration file, or Semgrep registry
    entry name.
                                  
    Use --config auto to automatically obtain
    rules tailored to this project; your project
    URL will be used to log in to the Semgrep
    registry.
                                  
    To run multiple rule files simultaneously,
    use --config before every YAML, URL, or
    Semgrep registry entry name. For example
    `semgrep --config p/python --config
    myrules/myrule.yaml`
                                  
    See https://semgrep.dev/docs/writing-
    rules/rule-syntax for information on
    configuration file format.

Within our community, we host a repository for all our rules files with the following structure.

semgrep
semgrep/aws.yaml
semgrep/azure.yaml
semgrep/base.yaml

As you can imagine, one for AWS rules, one for Azure, and standard stuff in base.

Using this structure, we can check out the repo as part of our GitHub Actions workflow and execute the following command to process all the rule files.

semgrep-agent --config auto --config .semgrep-rules/semgrep/

AWS Rules

Let's look at one of the AWS rules and walk through what they will trigger.

rules:
 - id: community.terraform.aws.iam.role.description
     patterns:
         - pattern: |
     resource "aws_iam_role" "..." {...}
         - pattern-not-inside: |
     resource "aws_iam_role" "..." {description=...}
     languages:
         - hcl
     severity: WARNING
     message: All aws_iam_roles should have a description
     metadata:
     category: style guide
     technology:
         - terraform
         - aws

We give the rule a unique id displayed in any output.

id: community.terraform.aws.iam.role.description

Then we match on the aws_iam_role terraform resource.

resource "aws_iam_role" "..." {...}

Then our match ensures a description has a value.

resource "aws_iam_role" "..." {description=...}

We set the expected language, define a severity, a friendly message, and some metadata to wrap things up nicely.

Azure Rules

Here is an Azure example rule.

rules:
	 - id: community.terraform.azure.provider.azurerm.features
	 patterns:
	 - pattern-inside: |
	 provider "azurerm" {... features {...} ...}
	 languages:
	 - hcl
	 paths:
	 include:
	 - versions.tf
	 severity: ERROR
	 message: azurerm features block should not be used within modules
	 metadata:
	 category: style guide
	 technology:
	 - terraform
	 - azure

Same deal as AWS, but here we are checking for the presence of a features flag within the provider block.

Let's move on to something more documentation related that we add to base.

Base Rules

The base rules file is where we add things that should be present in the documentation or that are common regardless of the public cloud provider.

Variable Description Length

This one uses a little regex to check descriptions for variables that are all 20+ characters.

rules:
 - id: community.terraform.missing.variable.description.length
	 patterns:
         - pattern: |
	 variable "..." {... description="$DESC" ...}
         - metavariable-regex:
	 metavariable: $DESC
	 regex: (^.{0,20}$)
	 languages:
         - hcl
	 severity: ERROR
	 message: All variables should have a description value of greater than 20 characters
	 metadata:
	 category: style guide
	 technology:
         - terraform

Pinning Terraform Modules

This one uses regex to check Terraform Modules are versioned.

rules:
- id: community.terraform.module.pinned.version
	 patterns:
         - pattern: module "..." {... source="..." ...}
         - pattern-not-regex: source\s+\=\s\"\S+ref=v\S+\"
	 languages:
         - hcl
	 severity: WARNING
	 message: Modules should be pinned to a specific version using the version tag
	 metadata:
	 category: style guide
	 technology:
         - terraform

Mandatory Headings

I can't take credit for this one as the fantastic semgrep team provided it over at semgrep slack community. It ensures that contributors aren't overly keen on adding too many headings.

rules:
 - id: community.terraform.module.mandatory.headings
	 patterns:
         - pattern-regex: |
         (?<=^|\n)## *(?!Usage\n|Inputs Notes\n|Requirements\n|Providers\n|Modules\n|Resources\n|Inputs\n|Outputs\n|Examples\n|Reference\n)[^ ][^\n]*
	 paths:
	 include:
         - README.md
	 languages:
         - regex
	 message: The following heading should be present in all modules Usage, Inputs Notes, Requirements, Providers, Modules, Resources, Inputs, Outputs, Examples, Reference
	 severity: WARNING
	 metadata:
	 category: styleguide
	 technology:
	 - terraform

GitHub Action

To finish the magic, we add the following GitHub Action workflow to all our Terraform modules to assist our contributors to follow the style of the community.

---
name: Super Linter

# yamllint disable-line rule:truthy
on:
  pull_request:
    branches: [master, main]

jobs:
  semgrep:
    name: Semgrep Terraform Community Style Guide
    runs-on: [self-hosted]

    container:
      image: 894702234348.dkr.ecr.ap-southeast-2.amazonaws.com/returntocorp/semgrep-agent:v1

    # Skip any PR created by dependabot to avoid permission issues
    if: (github.actor != 'dependabot[bot]')
    steps:
      - name: Checkout Code
        uses: actions/checkout@v2

      - name: Checkout Semgrep Rules
        uses: actions/checkout@v2
        with:
          repository: 'community-terraform/roadmap'
          path: .semgrep-rules
          token: ${{ secrets.PUBLIC_REPO_RO }}

      - name: semgrep-agent
        run: semgrep-agent --config auto --config .semgrep-rules/semgrep/

The observant ones will see that we are using an AWS Elastic Container Repo for the semgrep-agent container. Check out this guide on how to sync Docker Hub containers to ECS Repos.

We also have to pass in a PAT token so the Action can access the roadmap repo that contains our semgrep rules.

Hope this helps someone else

Cheers!