Searching GitHub Organisations

Searching GitHub Organisations

ยท

4 min read

๐Ÿ‘‹ Hey there!

As a DevOps ๐Ÿง‘โ€๐Ÿ’ป team grows, so does the number of repositories. If you use Infrastructure As Code and automation tools like Terraform or Ansible, you will likely have many repos that map to reusable modules. The modules are then combined to deliver full deployments. ๐Ÿš€

๐Ÿ‘ Best practice involves independent versioning of each module, but this introduces challenges at times.

Terraform Module Versions

One of those challenges ๐Ÿ‹๏ธโ€โ™‚๏ธ is moving workflows or deployments from one version control system to another. All references need to be identified depending on the strategy you are using.

If you are using something like terraform-docs this is nicely represented in the default README.md markdown that gets generated.

But if you are doing something via a submodule, then the visibility of this gets harder.

We are migrating from GitHub Enterprise Server to GitHub Enterprise Managed Users, which I'll do a separate post on; however, we have a few challenges to work through for our organisations.

  • Which deployments refer to other modules within the organisation
  • Are there any submodules used
  • Which ones have pipelines associated or other dependencies linked that we need to update

After giving it some thought ๐Ÿ’ญ, manually auditing all the repositories would waste my valuable time โฐ. So, instead, I decided to use the GitHub CLI ๐Ÿš€ to complete the task more efficiently. ๐Ÿ˜Ž

NOTE: I had inconsistent results using the gh search code command; hence, I went down this route.

GitHub CLI

Installation

I won't go into the installation of the CLI as the doco is excellent for that and is available here.

On a Mac, it is as simple as

brew install gh

GHES Host

As I'm using an instance of GitHub Enterprise running on AWS I need to set the host in my shell.

GH_HOST="ghes.io"

NOTE: It took me ages to find this in the doco, so that's the only reason I put it here is to be able to copy it next time I forget

Now that the basics are covered let's get to the good stuff: the meat of this article - ๐Ÿฅฉ extensions.

Extensions

GitHub CLI extensions are repositories that provide additional gh commands straight from the doco here.

For my purpose, I want to perform a few actions within each repo to identify where it may have external references to other reports and identify other dependencies.

These can all be created as functions within my extension.

  • Search .githmodules and extract the URL
  • Search for standard directories that show there is a linked pipeline
  • Search for git::https in terraform files and extract the URL
  • Search for GitHub Actions workflow files

The entire script is available at https://github.com/sjramblings/gh-dep-search

The output provides a line in CSV format, allowing me to compile the results quickly in Excel for analysis and later planning.

Installation

To install the extension, you point to the repository as shown.

NOTE: The repository must start with the gh- prefix for the name and your script; binary, whatever language it is written, can't have an extension. I started with .py on mine and got an error.

$ gh extension install sjramblings/gh-dep-search

Cloning into '/Users/sjramblings/.local/share/gh/extensions/gh-dep-search'...

remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 14 (delta 3), reused 8 (delta 2), pack-reused 0
Receiving objects: 100% (14/14), 4.05 KiB | 4.05 MiB/s, done.
Resolving deltas: 100% (3/3), done.

โœ“ Installed extension sjramblings/gh-dep-search

Workflow

Here is the workflow for my extension.

  • checkout all the repos
gh repo list org-name --limit 1000 | while read -r repo _; do
  repo_name=$(echo $repo | awk -F'/' '{print $2}')
  echo $repo_name
  gh repo clone "$repo" "$repo_name"
done
  • run my extension
for DIR in $(ls -d */ | grep -v 'venv')  ; do
  cd $DIR
  gh dep-search
  cd ..
done
  • report the output
Repo, Dependancy Count, URLS, Directories, workflows
repo1,1,No URLs found,cloudformation
repo2,0,No URLs found,
repo3,3,https://ghes.io/org-name/ansible-role-repo4|https://ghes.io/org-name/terraform-aws-repo5|https://ghes.io/org-name/terraform-aws-repo6,Pipelines|ansible|terraform
repo7,0,No URLs found,,superlinter.yml|housekeeping.yml|release-drafter.yml

Finally, I grab all the output and dump it in a CSV file with my headers.

Excel CSV

Summary

This post was fun using GitHub Cli extensions. It would have taken me hours to trawl through each repo and uncover the information above. Instead, I can run this easily for any organisation targeted for migration.

Hope this helps someone else!

Cheers

Did you find this article valuable?

Support Stephen Jones by becoming a sponsor. Any amount is appreciated!

ย