I was recently listening to a podcast from devtools.fm and monolithic repos were brought up, and one of arguments for them, was that it is hard to do CI/CD for lots of repos. In this post we are going to talk about a couple different ways to easily manage CI/CD jobs for hundreds of repos, relatively easy, with either Github Actions or Jenkins.

Let’s talk about Jenkins. First, if you are not using shared libraries, you should be, and there will be another blog post about those later. Now for the plugin, we use Github Branch Source Plugin which can scan all repos for an organization or user, and based on a file in the repo, run a certain pipeline from a completely different repo. This is where some thought needs to used to come up with a naming schema for the files in the repos.

We use something like this:

  • Jenkinsfile.config – File in the repo that holds environment variables loaded in the pipeline, and the file that is looked for, for the Pull Request jobs in the Github Org plugin.
    • Files linked to Jenkinsfile.config
      • Jenkinsfile.S3.config – Tells the Github Org plugin in Jenkins to run the S3 jobs.
      • Jenkinsfile.NODE.config – Tells the Github Org plugin in Jenkins to run the NODE jobs.
  • Example content
    env.NODEJS = 'NodeJS-16.x'
    env.NODEJS_BUILD = 'true'
    env.NODEJS_BUILD_CMD = 'yarn --frozen-lockfile --audit'
    env.YARN_INSTALL = 'true'
    env.NODEJS_TEST_CMD = 'npm run test'
    env.SONAR_RUN_SCANNER_CLI = 'true'
    env.SONAR_SRC_DIR = '.'
    env.NODE_ENV='UNIT_TEST'
    // Define main branch
    env.MAIN_BRANCH = 'dev'
    // Define hotfix branch
    env.HOTFIX_BRANCH = 'hotfix'
    // CD Environment for auto deploy
    env.CD_ENVIRONMENT = 'dev'

Now all you need to do is create your Jenkinsfile pipelines and make sure you read in the Jenkinsfile.config to set your environment variables.

  • Example of pipeline
    #!/usr/bin/env groovy
    
    node() {
    
      stage('Get Jenkinsfile.config') {
        env.JENKINS_AGENT = 'linux'
        env.NODEJS = 'NodeJS-LTS'
        env.SONAR_CREATE_PROJECT = 'true'
        env.SONARSCANNER = 'sonar-scanner-current'
        env.SONARSCANNERMSBUILD = 'sonar-scanner-msbuild-4.5.0.1761'
        env.SONAR_GO_COVERAGE_REPORT_PATH = 'coverage.out'
        env.SONAR_GO_TEST_REPORT_PATH = 'test-report.out'
        env.SONAR_RUN_MSBUILD = 'false'
        env.SONAR_RUN_QG = 'false'
        env.SONAR_RUN_SCANNER_CLI = 'false'
        env.SONAR_VERBOSE = 'false'
        env.SONAR_SRC_DIR = '.'
        env.SONAR_LCOV_PATHS = 'coverage/lcov.info'
        env.YARN_INSTALL = 'false'
        env.TESTS_MUST_PASS = 'true'
        checkout scm
        load "Jenkinsfile.config"
      }
    }
    
    pipeline {
      options {
        disableConcurrentBuilds()
        ansiColor('xterm')
      }
    agent { label env.JENKINS_AGENT }
    .......
    
    

Here is what the Github Org Plugin in Jenkins would look like for Pull Requests.

Here is what the Github Org Plugin in Jenkins would look like for builds that go to S3

 

With the above examples setup, it allows the developers of the repo’s to have some control of the CI/CD jobs as they can easily make changes to the Jenkinsfile.config file. We use this setup for an organization with over 200 repositories for all our pull requests, builds, and deploy jobs and over a dozen different pipelines for S3, Node, Ansible, .net, ASP, and Lambda. With this setup, it is very easy to add a new pipeline or support a new language, maybe 15 minutes more of work vs adding a custom pipeline to a single repo.

 

 

We are honored to have these awards

Tech Industry
Disruptive Tech Fast 50 Award
Inc 5000 Award
75 WD

What is IaC? When did it start? What kinds of tools are there? Which one should I choose? Let’s tackle each of these one by one throughout this blog.

What is IaC?

IaC is Infrastructure as Code, this means writing code, scripts, automation to bring servers, load balancers, databases etc. UP and online. Rather than logging into your favorite flavor of Cloud computing provider, or your local VM on-premise system and configuring and launching your resource, you’re writing code that you execute like a set of directions to bring everything up in one simple command. This way of automating infrastructure allows you and your company to scale, expand, shrink, modify your infrastructure easily, maintaining sanity, replicating it to other datacenters, and arguably most importantly keeping track of changes to your infrastructure in the same source control systems developers have been using to track changings in software.

 

When did it start?

That’s pretty tough to say… do we count bash scripts engineers have written to invoke CLI commands or APIs? If so, no one knows… Infrastructure as Code is a concept. It’s not a specific tool, but how you are converting the provisioning of resources via Code rather than a set of instructions on how to build a server and setup a database.

 

What kinds of tools are out there?

I’m going to cover what I would consider the Top 3 used tools that fall under IaC (for AWS), some may add other tools, but I’m talking about the toolsets that can and are designed to manage infrastructure.

Terraform: My personal favorite, and the go to for Softrams. Terraform was initially released on July 28th, 2014 from HashiCorp. I want to come out now and say, I’m biased… I’ve used every toolset I’m about to cover, and this is by far my favorite… we will get into why later on. Terraform is written in Go, a pretty easy to understand language you can look at the code with minimal experience in coding and understand what its doing and why. It also helps that the community is great and huge. The code is well commented, and modules are well documented. It supports all the major players. AWS, Azure, Google Cloud, Oracle Clout, VMware and OpenStack. Terraforms large community helps it stay up to date pretty well with changes that the providers it supports make. While Terraform doesn’t get updated as fast as let’s say CloudFormation, but it is fast enough for all our use cases. The one great thing is its open source. If the module you’re using doesn’t support something, take a quick Udemy lesson on go, and write the upgrade. For example, a few years ago, I worked for a company that used Chef, we needed terraform to be able to call chef and utilize an encrypted databag. Terraform didn’t support it. So I learned Go and summited a change. Terraform also allows you to organize your code as you see fit, within custom build modules, in 1 HUGE file or thousands of tiny files, its really up to you. Terraform knows based on *correct references* “hey don’t create the Route53 record that requires the IP address, until the server is created. It also can “self heal”. Kill of a resource manually, it will rebuild it the next time you run it, exactly how your built it the time before.

 

CloudFormation: Do we call CloudFormation a IaC tool. Yes… Kind of… CloudFormation is a PaaS from Amazon that allows you to script out YAML or JSON based instructions(code) for how you want to launch and configure resources in AWS by the Console or via a CLI command. You can actually launch a cloud formation stack via Terraform too ;-). What’s great about CloudFormation is it is updated… RIGHT AWAY. When Amazon releases a new product in AWS, the CloudFormation resources to manage it come with it. It’s also pretty well documented of course and has really good support behind it (AWS Support). The Bad… If statements are a bit of a pain, as its need to be written in its own way, it need someone to build it that has experience. You can’t do If, Else, you need to do If, If, If, If. And at the end of the day it gets ugly to look at, and not very straight forward for those who don’t know CloudFormation to be able to look at it and understand right away what it’s doing. It’s also Linear and does not make relationships between resources obvious. Thus, in a lot of cases if you delete a resource created by a cloud formation stack, the cloud formation stack dies. It doesn’t always know or say “oh, that resource in my instructions isn’t there anymore?!!? Let me launch it again”. Something that does stand out in CloudFormation from the others is the Amazon made CloudFormation Designer. This tool gives you a visual representation of your cloud formation stack, and actually allows you to create stacks and launch them using this visual tool. This is a great feature for those engineers out there that want or need visual representation and like the GUI feel vs a IDE or terminal editor.

 

Ansible: I would call Ansible the stateless engine that could. Like the saying “the little engine that could”. The reason I use that statement, is ansible is pretty light weight. To use Ansible you need: a terminal, install ansible, write playbooks. But it doesn’t just function as a IaC tool, it works as a CM (Configuration Management) tool. This would allow a team or company to focus on one toolset to handle provisioning their resources and configuring them down to the tiniest detail. Ansible is written in Python, a popular codebase which is known for its efficiency and performance. Ansible was released to the public on February 20th, 2012 and has a very strong following. So much so, RedHat acquired it in 2015. Although the tool is written in Python; all the average engineer or developer needs to know is how to google and write in YAML. Many say you can master ansible in as little as 2 hours. Ansible has a HUGE following, and HUGE community backing it up. It’s actually Softrams GO TO configuration management toolset, but we will cover CM Tools in another series. Ansible again is stateless, unlike CloudFormation and Terraform, Yes, I know of Tower and AWX but we’re not talking about them yet J . Every time you run a playbook against a server, it’s like you never ran it before. Ansible will talk to the server, usually over SSH. Essentially ask it questions or check for “things” and then make changes as needed or let you know that it’s OK. Ansibles provisioners or what they call “cloud_modules” support a good number of providers like AWS, Azure, Cloud Stack, Google Cloud, OpenStack and VMware, as well as various other smaller set of modules for Docker or various DNS, firewall etc.

 

Which one should I choose?! And WHY

For IaC we choose Terraform. Terraform is a great tool, and easy to learn.

 

So how do we utilize it… Softrams builds out Infrastructure functions as modules. For example, creating an ECS cluster using a custom AMI. There are a few parts here, for one of our main customer CMS, they have what is known as a Gold Image, this Gold Image is a pre-hardened version of Windows or RedHat, in this specific example RedHat, that has all the modifications and “pre-reqs” installed. With our ECS module we have to build it in a way that it creates all the resources that is utilized by a cluster, ECS Hosts (ec2 instances with ECS agent and docker installed), load balancer (in our case a ALB), target groups, security groups and IAM roles. By making a “module”. We can build out a “set” of resources that are needed to work together to build the ECS cluster as its own IaC codebase. We then just need to reference that module in our main terraform and pass along variables to instruct Terraform to build a unique cluster. This setup goes great with a Terraform feature called Workspaces.

 

Module Layout Example:

VPC:

-rw-r–r–  dhcp_options.tf

-rw-r–r–  flow_logs.tf

-rw-r–r–  internet_gateway.tf

-rw-r–r–  nat_gateway.tf

-rw-r–r–  route53.tf

-rw-r–r–  routes.tf

-rw-r–r–  subnets.tf

-rw-r–r–  variables.tf

-rw-r–r–  vpc.tf

 

The above fileset is broken out into functions of a VPC in AWS. Each file organized how we felt was best, working together by references.

 

All you then need to do is go into your main.tf and call the module

 

module “vpc” {

source           = “./vpc”

product         = “${var.product}”

env                 = “${var.env}”

cidr                 = “${var.cidr}”

data1_cidr    = “${var.data1_cidr}”

data2_cidr    = “${var.data2_cidr}”

app1_cidr     = “${var.app1_cidr}”

app2_cidr     = “${var.app2_cidr}”

dmz1_cidr     = “${var.dmz1_cidr}”

dmz2_cidr     = “${var.dmz2_cidr}”

dns_suffix     = “${var.dns_suffix}”

build               = “${var.build_vpc}”

build_route53 = “${var.build_route53}”

}

 

On to Workspaces, they allow you to create separate states, in our case, separate environments that could use difference instance sizes, different IP ranges or even not build some modules depending on the need. All while still using the same code base, same modules, just with a separate set of variables. Ref: https://www.terraform.io/docs/state/workspaces.html.

Lastly Terraform is versatile, works on multiple cloud providers, once you learn HCL or read through the extensive documentation which always include an example, it’s very easy to use and very feature rich. You can call your CM toolset like, Ansible, Chef or Puppet from Terraform, so once a resource gets created you can seamlessly have Terraform kick off a CM script against it to configure it.

We are honored to have these awards

Tech Industry
Disruptive Tech Fast 50 Award
Inc 5000 Award
75 WD

While Kubernetes provides a robust platform for managing container workflows it can become cumbersome to manage, especially at scale. How do you manage deployments? How do you ensure consistency across environments? How do you rollback/revert changes? How do you make it easy on developers? More and more frequently these days the answer is GitOps (https://www.weave.works/technologies/gitops/). GitOps is a methodology for managing Kubernetes where Git is the single source of truth. In this post I will go into some of our experiences implementing GitOps.

The first decision is what tool(s) to use to implement GitOps. There are a variety of them out there with different pros/cons. Some of the big names are ArgoCD  (https://argoproj.github.io/argo-cd/), Flux (https://fluxcd.io/),  and Jenkins X (https://jenkins-x.io/).

Flux: A super simple tool, but by the same token is somewhat limiting in functionality. The biggest drawback for what we were looking to do is that each implementation can only monitor a single repo. So, for each repo you want to deploy you would need a separate installation of Flux. This is desirable for certain situations, but not what we are looking for.

Jenkins X: Where Flux went simple Jenkins X veers the complete opposite direction providing an entire CI/CD platform for managing build, test, packaging, image storage, and deployment utilizing Cloud Native projects. If you are looking for one tool to handle the entire pipeline, then Jenkins X is worth a look. Yet for all its components it is still lacking in multi-tenancy and may require multiple installations.

ArgoCD: It has a nice combination of features without being overly complex and restrictive. You can add multiple repos with different levels of automation. One installation can even control deployments to multiple clusters.

ArgoCD seemed like a good fit for us. They provide manifests for installing https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml so we used those as a template and with a few modifications added them to our IaC  for standing up our K8s clusters through Terraform. The modifications we did were mainly to the argocd-cm ConfigMap that has the settings for ArgoCD. First, we included the ability to specify repo credentials so we can access our private repositories.

apiVersion: v1

kind: ConfigMap

metadata:

  labels:

    app.kubernetes.io/name: argocd-cm

    app.kubernetes.io/part-of: argocd

  name: argocd-cm

data:

  repository.credentials: |

%{ for credential in repository_credentials ~}

    - url: ${credential.url}

      passwordSecret:

        name: ${credential.secret}

        key: password

      usernameSecret:

        name: ${credential.secret}

        key: username

%{ endfor ~} 

As part of our core components we wanted to install with ArgoCD, we wanted to install Istio Operator for a service mesh. For Istio we needed to first install the Istio Operator helm chart and then deploy an IstioOperator Kubernetes object. We were having trouble getting things to run in the correct order since the IstioOperator CRD wasn’t available until after the Istio Operator helm chart was applied.

ArgoCD has the ability to specify sync waves https://argoproj.github.io/argo-cd/user-guide/sync-waves/#how-do-i-configure-waves to control the order syncs are run. We were able to use that to specify the correct order of deployment. But even with sync waves specified we noticed that ArgoCD would progress to the next sync wave before all the Istio resources were created by the Operator. This would cause issues when we tried to specify Istio Gateways for ingress in following sync waves. After doing some research we stumbled upon  https://nemo83.dev/posts/argocd-istio-operator-bootstrap/ which had our answer. We needed to setup a custom health check so that ArgoCD wouldn’t mark the IstioOperator healthy until all the Istio resources were deployed.

apiVersion: v1

kind: ConfigMap

metadata:

  labels:

    app.kubernetes.io/name: argocd-cm

    app.kubernetes.io/part-of: argocd

  name: argocd-cm

data:

  repository.credentials: |

%{ for credential in repository_credentials ~}

    - url: ${credential.url}

      passwordSecret:

        name: ${credential.secret}

        key: password

      usernameSecret:

        name: ${credential.secret}

        key: username

%{ endfor ~} 

 

  resource.customizations: |

    install.istio.io/IstioOperator:

      health.lua: |

        hs = {}

        if obj.status ~= nil then

          if obj.status.status == "HEALTHY" then

            hs.status = "Healthy"

            hs.message = "IstioOperator Ready"

            return hs

          end

        end

 

        hs.status = "Progressing"

        hs.message = "Waiting for IstioOperator"

        return hs      

That got us past the order issues. Now we could get everything deployed through GitOps. But we kept having MutatingWebhookConfiguration immediately go out of sync because caBundle was getting changed outside of Git by Istio. This behavior is expected, but ArgoCD didn’t know that. Luckily, ArgoCD provides a way to handle these situations (https://argoproj.github.io/argo-cd/user-guide/diffing/) so we were able to update the ConfigMap to ignore the differences.

 

apiVersion: v1

kind: ConfigMap

metadata:

  labels:

    app.kubernetes.io/name: argocd-cm

    app.kubernetes.io/part-of: argocd

  name: argocd-cm

data:

  repository.credentials: |

%{ for credential in repository_credentials ~}

    - url: ${credential.url}

      passwordSecret:

        name: ${credential.secret}

        key: password

      usernameSecret:

        name: ${credential.secret}

        key: username

%{ endfor ~} 

 

  resource.customizations: |

    admissionregistration.k8s.io/ValidatingWebhookConfiguration:

      # List of json pointers in the object to ignore differences

      ignoreDifferences: |

        jsonPointers:

        - /webhooks/0/clientConfig/caBundle     

        - /webhooks/0/clientConfig/failurePolicy 

    admissionregistration.k8s.io/v1beta1/ValidatingWebhookConfiguration:

      # List of json pointers in the object to ignore differences

      ignoreDifferences: |

        jsonPointers:

        - /webhooks/0/clientConfig/caBundle     

        - /webhooks/0/clientConfig/failurePolicy 

    admissionregistration.k8s.io/MutatingWebhookConfiguration:

      # List of json pointers in the object to ignore differences

      ignoreDifferences: |

        jsonPointers:

        - /webhooks/0/clientConfig/caBundle

        - /webhooks/0/clientConfig/failurePolicy

    install.istio.io/IstioOperator:

      health.lua: |

        hs = {}

        if obj.status ~= nil then

          if obj.status.status == "HEALTHY" then

            hs.status = "Healthy"

            hs.message = "IstioOperator Ready"

            return hs

          end

        end

 

        hs.status = "Progressing"

        hs.message = "Waiting for IstioOperator"

        return hs      

We finally had a working implementation of ArgoCD, but now what? We wanted to make it as easy as possible to onboard new applications to the cluster and for each group to have their own repos for their applications. The app of apps (https://argoproj.github.io/argo-cd/operator-manual/declarative-setup/#app-of-apps) methodology seemed perfect.

We are able to specify a repo during bootstrapping of the cluster that stores app manifests. ArgoCD monitors the repo and automatically applies any changes. So, if we want to add a new application onto the cluster, we just create the app manifest in the repo and ArgoCD deploys it. The developers only have to worry about the manifests for their applications which can be stored within the same repo as their code.

Having Git be the true source of truth for the cluster means we don’t have to worry about manual changes being done that are undocumented and not included in automation. ArgoCD will immediately identify any sync issues, and if we have specified auto sync, revert to what is in Git. Continuous Deployments are also super easy without having to rewrite pipelines for each application we want to onboard. So far, our experience with GitOps has been a good one and we look forward to exploring ways to leverage the benefits more fully.

We are honored to have these awards

Tech Industry
Disruptive Tech Fast 50 Award
Inc 5000 Award
75 WD