Optimizing Your CI/CD Pipeline with GitHub Actions: Identifying Issues and Implementing Solutions

This guide explores how GitHub Actions enhances both the development process and business efficiency. It's essential for developers wanting better workflows and leaders seeking improved productivity.

A well-functioning CI/CD pipeline is crucial for rapid, secure and reliable software delivery. However, several factors can slow down the process, introduce instability, and hinder its overall efficiency.

This can have a significant impact on the development team's productivity, the quality of the software, and the overall delivery process. Identifying these issues and implementing solutions is essential to optimize your CI/CD pipeline for speed, stability, and efficiency.

Key Benefits of an Optimized CI/CD Pipeline

Efficiency and Productivity

Key Benefit Description
Time to Market Reduces delays in releasing new features and bug fixes.
Developer Productivity Minimizes long build times and unreliable tests, boosting developer morale.
Operational Efficiency Streamlines operations and reduces manual intervention.
Cost Efficiency Eliminates wasted resources and reduces cloud costs.

Quality and Reliability

Key Benefit Description
Quality Stabilizes pipelines, reducing bugs and issues in production.
Resilience Helps your team recover quickly from failures and outages.
Scalability Scales with your business as it grows.

Team and Business Impact

Key Benefit Description
Competitive Advantage Delivers features faster and more reliably than competitors.
Customer Satisfaction Ensures faster delivery of new features and bug fixes.
Team Morale Prevents demotivation and burnout caused by slow and unreliable pipelines.
Innovation Encourages experimentation and innovation.

Compliance and Collaboration

Key Benefit Description
Compliance and Security Shifting security to the left in your CI/CD pipeline reduces security vulnerabilities and compliance risks. By integrating security checks early in the development process, you can catch and fix issues before they make it to production. This proactive approach not only improves the security of your application, but also saves time and resources by preventing costly security breaches and compliance violations.
Visibility Provides insight into the health and performance of your software delivery process.
Collaboration Fosters collaboration and communication within your team.

Part 1: Identifying Issues in Your CI/CD Pipeline

Codebase Size - impact of Codebase Size on CI/CD Pipeline Performance

As the codebase expands, the time required for building, testing, and deploying also increases. Large codebases can lead to extended execution times at each stage of the pipeline, slowing down the overall process and delaying the delivery of new features and bug fixes.

Illustration
stateDiagram
    state Codebase-Size {
    Codebase --> Build: Large codebase increases build time
    Codebase --> Test: Extensive testing required for large codebase
    Codebase --> Deploy: Longer deployment times for large codebase
    }

Inefficient Build Scripts

Build scripts that are not optimized can significantly slow down the pipeline. These scripts may include unnecessary dependencies, inefficient steps, and lack of caching. By optimizing these scripts, you can reduce the time taken for each build and make your pipeline more efficient.

Illustration
stateDiagram
    state Inefficient_Build_Scripts {
        state Build {
            [*] --> Build_Process: Start
            Build_Process --> [*]: End
        }
        Build --> Test: Slow build times delay testing
        Build --> Deploy: Inefficient build scripts slow down deployment
        Build --> Transparency: Inefficient scripts obscure the build process
        Build --> Troubleshooting: Inefficient scripts make troubleshooting difficult
    }
    Transparency --> Bottleneck: Lack of transparency slows down the pipeline
    Troubleshooting --> Bottleneck: Difficulties with troubleshooting slow down the pipeline

Long-running or Unreliable Tests

Tests that take a long time to run or are unreliable can increase the execution time of the pipeline and introduce instability. These tests can also delay the feedback developers receive, slowing down the development process. It's crucial to optimize tests to ensure they are reliable and run within a reasonable time frame.

Illustration
stateDiagram
    state Long-running-or-Unreliable-Tests {
    Test --> Feedback: Long-running tests delay feedback
    Test --> Pipeline: Unreliable tests introduce instability
    }

Lack of security checks early in the pipeline

Security checks that are not integrated early in the pipeline can lead to vulnerabilities and compliance risks. This can cause the following issues:

  • Security vulnerabilities in the codebase
  • Compliance violations
  • Increased risk of data breaches
  • Delayed detection of security issues
  • Manual security checks
  • Lack of visibility into security vulnerabilities
  • Security may or may not be discovered and can be in the production environment
  • Discovering vulnerabilities late in the pipeline can be costly and time-consuming to fix

Lack of Parallelism - not running tasks in parallel

Running pipeline tasks sequentially can be inefficient. By running jobs in parallel, you can speed up the pipeline and improve overall efficiency. This approach allows multiple tasks to be executed simultaneously, reducing the total time taken for the pipeline to complete.

Illustration flowchart
stateDiagram
    state "CI/CD Pipeline" as Pipeline {
        state "Build" as Build
        state "Test" as Test {
            state "Unit Tests" as UnitTests {
                state "Test 1" as Test1
                state "Test 2" as Test2
                state "Test 3" as Test3
            }
        }
        state "Deploy" as Deploy
    }
    state "Lack of Parallelism" as LackParallelism {
        state "Longer Build Times" as LongBuild
        state "Sequential Testing" as SeqTest {
            state "Unit Tests Run Sequentially" as UnitTestSeq {
                state "Test 1" as SeqTest1
                state "Test 2" as SeqTest2
                state "Test 3" as SeqTest3
            }
        }
        state "Slower Deployments" as SlowDeploy
    }
    Pipeline --> LackParallelism: Lack of parallelism slows down the pipeline
    LackParallelism --> LongBuild: Longer build times due to sequential execution
    LackParallelism --> SeqTest: Testing runs sequentially due to waiting for build completion
    SeqTest --> SeqTest1: Test 1 runs
    SeqTest1 --> SeqTest2: Test 2 waits for Test 1 to complete
    SeqTest2 --> SeqTest3: Test 3 waits for Test 2 to complete
    LackParallelism --> SlowDeploy: Slower deployments due to waiting for tests to complete

Manual Approvals

Manual steps in the pipeline can create bottlenecks and slow down the process. Waiting for manual approvals can introduce delays in the deployment process. While manual approvals are sometimes necessary for compliance and security reasons, it's important to streamline these processes and automate them where possible.

Illustration
stateDiagram
    state "CI/CD Pipeline" as Pipeline {
        state "Continuous Integration" as CI {
            state "Automated Testing" as AutoTest
            state "Manual Testing" as ManualTest
        }
        state "Software Release" as Release {
            state "Manual Approvals" as ManualApproval
            state "Manual Deployment" as ManualDeploy
        }
    }
    state "Unnecessary Manual Steps" as ManualSteps {
        state "Long Waits for Approvals" as LongWaits
        state "Lack of Automated Testing" as LackAutoTest
        state "Manual Deployment Delays" as ManualDeployDelays
    }
    Pipeline --> ManualSteps: "Identify Inefficiencies"
    ManualSteps --> LongWaits: "Wait for Approval"
    ManualSteps --> LackAutoTest: "Perform Manual Testing"
    ManualSteps --> ManualDeployDelays: "Perform Manual Deployment"

Limited Resources

If the runners executing your workflows have insufficient CPU, memory, or storage, it can lead to slowness and instability within the pipeline. Ensuring your runners have adequate resources can help maintain a stable and efficient pipeline.

Illustration - Limited Resources effect on CI/CD Pipeline
stateDiagram
    LimitedResources: Limited Resources
    InsufficientCPU: Insufficient CPU
    InsufficientMemory: Insufficient Memory
    Build: Build
    Test: Test
    Deploy: Deploy
    LimitedResources --> InsufficientCPU: Insufficient CPU slows down task execution
    LimitedResources --> InsufficientMemory: Insufficient memory causes task failures
    InsufficientCPU --> Build: Build tasks are slowed down
    InsufficientCPU --> Test: Test tasks are slowed down
    InsufficientCPU --> Deploy: Deploy tasks are slowed down
    InsufficientMemory --> Build: Build tasks fail due to out of memory
    InsufficientMemory --> Test: Test tasks fail due to out of memory
    InsufficientMemory --> Deploy: Deploy tasks fail due to out of memory

External Dependencies

External dependencies, such as APIs and services, that are slow or unreliable can slow down and destabilize the pipeline if they are not effectively managed. It's important to ensure these dependencies are reliable and have adequate performance to prevent them from becoming a bottleneck in your pipeline.

Examples
  • Third-Party APIs: If your application relies on third-party APIs, any slowdown or instability in these APIs can directly impact your pipeline. For instance, if you're using an API to fetch data for testing, and the API is slow, your tests will take longer to run, slowing down the entire pipeline.

  • Database Services: If your application connects to a database service, the performance of that service can impact your pipeline. For example, if the database service is slow to respond because of high load or network issues, your build or deployment process may be delayed.

  • Cloud Services: If you're using cloud services like Azure, AWS, or Google Cloud, any issues with these services can impact your pipeline. For instance, if there's a slowdown in the service or your have a private network that is not properly configured, it can slow down your builds or deployments.

  • Package Repositories: If you're using package repositories like npm or Maven, any issues with these repositories can impact your pipeline. For example, if the repository is slow to respond or has network issues, it can slow down your builds as your build process waits for packages to download.

Illustration
graph TD
    A[Start CI/CD Pipeline] --> B{Check External Dependencies}
    B -->|Long API Calls| C[API Testing Stage: Slowed down due to long response times from third-party APIs]
    B -->|Slow Database Connection| D[Database Migration Stage: Slowed down due to slow database connections]
    B -->|Slow Cloud Services| E[Build and Deployment Stages: Slowed down due to slow response times from cloud services]
    B -->|Slow Package Repositories| G[Package Download Stage: Slowed down due to slow response times from package repositories]
    C --> F[End CI/CD Pipeline: Potential delay in pipeline completion]
    D --> F
    E --> F
    G --> F

Improper Configuration Management

Poor configuration management can lead to errors and inconsistencies across different environments. This can make troubleshooting difficult and slow down the pipeline. Effective configuration management ensures consistency across all environments and makes it easier to identify and fix issues.

Illustration
stateDiagram
    state Improper_Configuration_Management {
    Configuration --> Build: Inconsistent configurations cause build errors
    Configuration --> Test: Incorrect configurations lead to test failures
    Configuration --> Deploy: Misconfigured environments cause deployment issues
    Configuration --> Troubleshooting: Poor configuration management makes troubleshooting difficult
    }

Lack of Monitoring

Without proper monitoring tools, it can be difficult to identify and troubleshoot pipeline problems promptly. Monitoring provides real-time insights into the health and performance of your pipeline, allowing you to quickly identify and address bottlenecks and inefficiencies. This can cause delays in identifying and resolving issues, leading to downtime and reduced productivity.

Examples
  • Performance and Resource Monitoring: Monitoring the performance of your pipeline, including build times, test durations, deployment speeds, and resource usage (such as CPU, memory, and storage), can help you identify areas for improvement, optimize resource allocation, and prevent bottlenecks.
  • Error Tracking and Alerting: Tracking errors and setting up alerts for critical issues or failures in your pipeline can help you quickly identify, address, and prevent issues from escalating, thereby improving reliability and reducing downtime.
  • Logging and Visualization: Logging pipeline activities and visualizing performance metrics can provide insights into the efficiency and reliability of your pipeline, aid in troubleshooting, and track progress effectively.
  • Historical Data and Predictive Analytics: Storing historical data about your pipeline performance and leveraging it to predict potential issues can help you track trends, make data-driven decisions, and proactively address problems to improve reliability and efficiency.
  • Integration and Automation: Integrating monitoring tools with your pipeline and automating tasks like alerting and reporting can provide seamless visibility into pipeline performance, streamline processes, and ensure timely responses to issues.
  • Scalability and Compliance: Ensuring that your monitoring tools can scale with your pipeline and monitor activities for compliance with security, privacy, and regulatory requirements is critical for maintaining visibility, control, data integrity, and protecting sensitive information.
  • Collaboration and Continuous Improvement: Sharing monitoring data with team members can foster collaboration and collective problem-solving. Using this data to identify areas for improvement and implementing changes can drive continuous improvement in pipeline management.
  • Real-time Insights: Monitoring pipeline performance in real-time can provide immediate insights into issues, enabling rapid response and resolution to minimize downtime and disruptions.
Illustration
stateDiagram
    state Lack_of_Monitoring {
    Pipeline --> Monitoring: Lack of visibility into pipeline performance
    Monitoring --> Bottlenecks: Unable to identify and address bottlenecks
    Monitoring --> Troubleshooting: Difficulties in troubleshooting pipeline issues
    }

Large Deployments

Deploying large artifacts, such as application packages, can significantly slow down the pipeline. These large deployments can also increase the risk of errors and failures. Optimizing the deployment process to reduce the size of artifacts and speed up the deployment process is crucial for maintaining an efficient pipeline.

Examples
  • Large Application Packages: Deploying large application packages can slow down the deployment process, increase resource consumption, and introduce errors. Optimizing the size of these packages can help reduce deployment times and improve pipeline efficiency.
  • Data Migration: Migrating large datasets or databases as part of the deployment process can be time-consuming and resource-intensive. Implementing efficient data migration strategies, such as incremental updates or parallel processing, can help speed up the deployment process and reduce downtime.
  • Configuration Changes: Making extensive configuration changes during deployment can introduce complexity and increase the risk of errors. Implementing automated configuration management tools and version control practices can help streamline the deployment process and ensure consistency across environments.
  • Dependency Management: Managing dependencies, such as libraries, frameworks, and third-party services, can impact the deployment process. Ensuring that dependencies are up to date, properly managed, and efficiently deployed can help reduce deployment times and improve pipeline reliability.
  • Environment Setup: Setting up deployment environments, such as servers, databases, and networking configurations, can be time-consuming and error-prone. Using infrastructure as code (IaC) tools and automation scripts to provision and configure environments can help standardize the deployment process and reduce manual intervention.
  • Rollback and Recovery: Deploying large artifacts increases the risk of deployment failures and errors. Implementing rollback and recovery strategies, such as blue-green deployments or canary releases, can help mitigate risks and ensure smooth deployment processes.
  • When deploying containerized applications, not optimizing container images, neglecting multi-stage builds, and failing to leverage container registries can lead to larger image sizes, slower deployment speeds, and reduced pipeline efficiency.

Improper Version Control - impact of improper version control practices on CI/CD pipeline

A messy codebase with poor branching strategies and lack of code reviews can make it difficult to track changes and identify issues. Improper version control practices can lead to conflicts, errors, and instability in the pipeline and collaboration process within the team. Adopting proper version control practices is an essential part of a DevOps culture and can significantly improve the stability and efficiency of your pipeline.

Examples
  • Branching Strategies: Not having a clear branching strategy, such as feature branching, release branching, or trunk-based development, can lead to conflicts, delays, and errors in the pipeline. Implementing a consistent branching strategy can help streamline development, testing, and deployment processes.
  • Code Reviews: Skipping code reviews or having inconsistent code review practices can introduce bugs, security vulnerabilities, and quality issues in the codebase. Implementing code reviews as part of the development process can help catch and fix issues early, improve code quality, and ensure consistency across the team.
  • Tagging Releases: Not tagging releases or using inconsistent versioning practices can make it difficult to track changes, identify issues, and deploy updates. Implementing version control practices, such as semantic versioning, tagging releases, and maintaining a changelog, can help streamline the release process and ensure consistency across different environments.
  • Automation: Failing to automate version control tasks, such as branching, tagging, and merging, can introduce manual errors, delays, and inconsistencies in the pipeline. Implementing automation tools and scripts to manage version control tasks can help reduce manual intervention, improve efficiency, and ensure accuracy in the pipeline.
  • Collaboration: Not having clear guidelines, permissions, and workflows for version control can lead to confusion, conflicts, and inefficiencies in the team. Establishing collaboration practices, such as code ownership, pull request reviews, and branching policies, can help foster teamwork, improve communication, and streamline the development process.
  • Documentation: Neglecting to document version control practices, policies, and workflows can lead to misunderstandings, errors, and inconsistencies in the team. Maintaining clear and up-to-date documentation for version control practices can help onboard new team members, ensure compliance with standards, and promote best practices within the team.

Part 2: Implementing Solutions to Optimize Your CI/CD Pipeline

Inefficient Build Scripts - bad and good examples and practices

When implementing build scripts in your workflows, consider the following best practices to optimize your build process:

  • Use caching to store dependencies and build artifacts between workflow runs. This can significantly reduce build times by avoiding redundant downloads and builds.
  • Minimize the number of dependencies and steps in your build process. Remove unnecessary steps, dependencies, and build tools to streamline the process.
  • Use incremental builds to only rebuild parts of your application that have changed. This can save time by avoiding full rebuilds for minor changes.
  • Optimize your build scripts for performance. Use parallelization, caching, and efficient build tools to speed up the build process.
  • Use GitHub Actions' matrix strategy to run multiple build jobs in parallel. This can help distribute the workload and speed up the overall build process.
  • Make sure your build scripts are idempotent and reproducible. This ensures that builds are consistent and reliable across different environments.
  • If possible, use ephemeral runners for disposable build environments. This can help isolate builds and avoid conflicts between different builds running on the same runner making it more deterministic and reliable.
Bad example: CI - Non-Optimized Build Times
name: CI - Non-Optimized Build Times

on:
  push:
    branches: [ master ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      # No caching is used, which means dependencies are downloaded every time
      - name: Install dependencies (Bad Practice)
        run: mvn install

        # Explanation: This approach downloads all dependencies afresh on every build, 
        # regardless of whether they have changed. This can significantly slow down 
        # build times, especially for projects with large dependencies.

      # The build process is not optimized, it does not use incremental builds or parallelization
      - name: Build with Maven (Bad Practice)
        run: mvn -B package --file pom.xml

        # Explanation: This example uses a basic `mvn package` command. Consider using 
        # Maven features like incremental builds or specific goals (e.g., `mvn clean install`) 
        # to optimize the build process and only rebuild what's necessary. Additionally, 
        # explore parallelization options offered by Maven or third-party actions for 
        # multi-threaded builds if applicable.

      # Long running steps without a timeout can cause the workflow to fail
      - name: Long running build step (Bad Practice)
        run: ./long-running-script.sh

        # Explanation: This step lacks a timeout configuration. If the script takes 
        # excessively long to execute, the entire workflow could hang indefinitely. 
        # Implement a timeout using the `timeout` keyword within the `run` command 
        # to prevent such failures. 
        # (e.g., timeout 5m ./long-running-script.sh) 
Good example: CI - Optimized Build Times
name: CI - Optimize Build Times

on:
  push:
    branches: [ master ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      # Cache dependencies to speed up builds
      - name: Cache dependencies
        uses: actions/cache@v3
        with:
          # Path to the dependencies that need to be cached
          path: ~/.m2/repository
          # Key for the cache. The cache will be saved with this key.
          key: $-maven-$
          # If the key doesn't match with any existing caches, 
          # the action will use the caches with keys that match with restore-keys
          restore-keys: |
            $-maven-

      - name: Set up JDK 17
        uses: actions/setup-java@v4
        with:
            java-version: '17'
            distribution: 'temurin'
            cache: maven
      - name: Build with Maven
        run: mvn -B package --file pom.xml
        timeout-minutes: 10

        # Explanation: This example caches Maven dependencies to speed up builds.
        # It also uses the `actions/setup-java` action to set up the JDK version.
        # The build command is optimized to use the `-B` flag for batch mode and
        # specify the `package` goal. This approach can help improve build times
        # and optimize the build process.
        # Additionally, a timeout configuration is set to prevent long-running steps. This ensures that the workflow does not hang indefinitely.
        # Also replacing custom scripts with built-in actions can help optimize the build process, better standardize the workflow and have better error handling and trasparency.

Important: It's recommended to avoid using custom scripts that aren't optimized for the build process. Instead, leverage the built-in features of your build tools, or utilize third-party actions specifically designed for performance and efficiency. This approach can significantly streamline your build process and reduce build times.

Unlocking Efficiency with Parallel Jobs in GitHub Actions

One of GitHub Actions' most powerful features is its support for parallel job execution. This means you can run multiple jobs within your workflow simultaneously, significantly accelerating your CI/CD pipeline.

This capability shines particularly bright in scenarios like running various types of tests: unit tests, integration tests, and end-to-end tests. As long as these tests are independent (don't rely on each other's outputs), parallelization can dramatically reduce your pipeline's overall execution time.

But the benefits extend beyond testing. Parallel job execution is also ideal for deploying to multiple environments concurrently, allowing you to streamline your deployment process.

Furthermore, by leveraging the matrix strategy, you can define different configurations for your jobs and execute them in parallel. This offers even greater flexibility for optimizing your pipeline's speed and efficiency.

In essence, parallel job execution in GitHub Actions empowers you to:

Dramatically reduce pipeline execution time by running independent tasks concurrently. Expedite testing by parallelizing independent tests. Streamline deployments by deploying to multiple environments simultaneously. Optimize your pipeline by leveraging the matrix strategy to define and run multiple job configurations in parallel.

Example 1a: Parallel Testing with Unit Tests (NodeJS/NPM)
name: CI - Unit Tests (Parallel with NPM)

on:
  push:
    branches: [ master ]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        component: [ "api", "ui", "services" ]
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js environment
        uses: actions/setup-node@v3
        with:
          node-version: 16

      - name: Install dependencies
        run: npm install

      - name: Run unit tests (component: $)
        run: npm test -- --coverage --coverageReporters=json-summary,lcov --coverageDirectory=coverage/$

# This workflow defines a single job named "unit-tests".
# The strategy section uses a matrix to run the job three times, once for each project component ("api", "ui", "services").
# Each job run utilizes the npm test command with additional flags for code coverage, specifying the output directory based on the tested component.
# Since these tests are likely independent (assuming no component tests rely on another's setup or data), running them concurrently reduces overall test execution time.
Example 1b: Parallel Testing with Unit Tests (Java/Maven)
name: CI - Unit Tests (Parallel with Maven)

on:
  push:
    branches: [ main ]
  pull_request:
    branches:
        - main
        - feature/*
        - develop

jobs:
    unit-parallel-tests:
    # unit-parallel-tests job will run on ubuntu-latest github-hosted runner
    name: UNIT-PARALLEL-TESTS
    runs-on: ubuntu-latest
    needs: # needs build job and runner-indexes job to be completed before running the unit-parallel-tests job
      - build
      - runner-indexes
    container:
      image: mrkostin/maven:3.6.0-alpine-git-curl-jq # ruinning the job in a container - mrkostin/maven:3.6.0-alpine-git-curl-jq
    services:
      # postgres service container
      postgres: # service name - postgres. This name is used to access the service container from the job container as the host name.
        image: postgres # running the job in a container - postgres link to the docker hub - https://hub.docker.com/_/postgres
        env:
          POSTGRES_PASSWORD: postgres # setting the password for the postgres database
        # exposing the port 5432 of the postgres service container to the host machine
        ports:
          - 5432:5432
      # redis service container for caching session data
      redis: # service name - redis. This name is used to access the service container from the job container as the host name.
        image: redis # running the job in a container - redis link to the docker hub - https://hub.docker.com/_/redis
        # exposing the port 6379 of the redis service container to the host machine
        ports:
          - 6379:6379
    #  defining the job permissions
    permissions:
      contents: read # read access to the repository contents
      packages: write # write access to the repository packages
      id-token: write # write access to the repository id token
    strategy: # defining the job to run in parallel with the matrix strategy and runner-indexes job output
      fail-fast: true # cancels all in-progress jobs if any matrix job fails link to the documentation - https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast
      matrix: # defining the matrix strategy to run the job in parallel using x number of github-hosted runners defined in the env total_runners above
        runner-index: $ # using the runner-indexes job output to define the matrix strategy
    steps:
      - name: Checkout repository # checkout the repository
        uses: actions/checkout@v3.0.2
      # caching the maven packages to speed up the build process. 
      # Link to the documentation - https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows
      - name: Cache Maven packages
        uses: actions/cache@v3 # using the actions/cache@v3 action to cache the maven packages
        with:
          path: /root/.m2 # path to cache
          key: $-junit-$ # key for restoring and saving the cache
          restore-keys: $-junit- # key for restoring the cache if no exact match is found
      # In this step, we are downloading the latest artifact from the build job and storing it in the container
      - run: |
          # Download the latest tests results artifact number from the GitHub API using jq to parse the JSON response and store it in a file called artifacts_list.txt
          curl \
            -H "Accept: application/vnd.github+json" \
            -H "Authorization: Bearer $"\
            -H "X-GitHub-Api-Version: 2022-11-28" \
            https://api.github.com/repos/$/actions/artifacts | jq -r '.artifacts | sort_by(.created_at) | .[] | select(.name == "Test Results") | .id' > artifacts_list.txt

          LATEST_ARTIFACT_NUMBER=$(cut -d: -f 2 artifacts_list.txt | sort -n | tail -n 1)
          
          curl \
              -H "Accept: application/vnd.github+json" \
              -H "Authorization: Bearer $" \
              -H "X-GitHub-Api-Version: 2022-11-28" \
              -L -o my_artifact.zip \
              https://api.github.com/repos/$/actions/artifacts/"${LATEST_ARTIFACT_NUMBER}"/zip

          # Unzip the artifact to the test_results directory
          mkdir test_results
          unzip my_artifact.zip -d test_results 2> /dev/null || true

        # Setup tmate session for ssh debugging if ssh_debug_enabled is set to true 
      - name: Setup tmate session
        uses: mxschmitt/action-tmate@v3
        if: $

      # split-tests action - splits the tests into x number of groups 
      # based on the total number of github-hosted runners and junit previous test results by time and line count. 
      # Link to the action - https://github.com/marketplace/actions/split-tests
      - uses: chaosaffe/split-tests@v1-alpha.1
        id: split-tests
        name: Split tests
        with:
          glob: src/test/**/**/*.java # glob pattern to match the test files
          split-total: $ # total number of github-hosted runners
          split-index: $ # current runner index
          junit-path: test_results/*xml # path to the junit test results with wildcards to match all the files - report from the previous test runs by timing
          line-count: false # split the tests based on the junit test results by line count
      # run the tests in parallel looping through the test-suite output from the split-tests action
      - run: 'echo "This runner will execute the following tests: $"'
      - run: |
          LIST="$"
          for file in $LIST
          do
          # sleep for 10 seconds to avoid timeout errors
            sleep 10
            mvn -Dtest=$(basename $file | sed -e "s/.java/,/" | tr -d '\r\n') -e test
          done
      
      - uses: actions/upload-artifact@v3 # upload the test results as an artifact
        with:
          name: Test Results
          path: ./target/surefire-reports # path to the test results
          retention-days: 90 # retention period for the artifact in days. Link to the documentation - https://docs.github.com/en/actions/guides/storing-workflow-data-as-artifacts#about-workflow-artifact-retention

Note:

  • For a complete example script click here
  • The strategy section uses the matrix strategy to run the job in parallel with the number of GitHub-hosted runners defined in the total_runners environment variable.
  • The runner-indexes job output is used to define the matrix strategy, ensuring that the job runs concurrently on multiple runners.
  • The split-tests action splits the tests into groups based on the total number of GitHub-hosted runners and the previous test results. This ensures that tests are distributed evenly across the runners.
  • The for loop runs the tests in parallel, executing the tests assigned to each runner based on the output of the split-tests action.
  • The test results are uploaded as an artifact for further analysis and reporting.
Example 2: Parallel Deployment to Staging and Production Environments - common with Blue-Green Deployment
name: Deploy (Parallel)

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        environment: [ staging, production ]
    steps:
      - uses: actions/checkout@v4

      # Build and package your application here (replace with your build steps)
      - name: Build application
        run: ./build.sh

      - name: Deploy to environment (environment: $)
        uses: your-deployment-action@v1
        with:
          environment: $
          # Add other deployment specific arguments here

# This workflow defines a single job named "deploy".
# Similar to the previous example, the strategy section uses a matrix to run the deployment job twice, once for each environment ("staging", "production").
# Each job run utilizes a custom deployment action (your-deployment-action@v1) with an argument specifying the target environment.
# Assuming the deployment scripts are independent (no dependencies between environments), this approach allows for concurrent deployment to both environments, potentially saving time.

Note: Use this code with causation, as deploying to production in parallel with staging can have unintended consequences. Ensure that your deployment process is designed to handle parallel deployments and that it won't cause conflicts or issues.

Example 3: Parallel Matrix Strategy for Monorepo Build and Test
name: CI - Monorepo Build and Test (Parallel)

# This workflow is triggered every time code is pushed to the main branch
on:
    push:
        branches: [ main ]

jobs:
    build-and-test:
        # This job runs on the latest version of Ubuntu
        runs-on: ubuntu-latest
        strategy:
            # The matrix strategy allows you to run the job multiple times with different configurations. This means that each service (frontend, backend, database) will be built and tested in parallel on separate runners in parallel.
            # In this case, it's used to run the job for three different services: frontend, backend, and database
            matrix:
                service: [ frontend, backend, database ]
        steps:
            # This step checks out your repository so your job can access it
            - uses: actions/checkout@v4

            # This step caches dependencies to avoid re-downloading on each build
            # This can significantly speed up your builds
            - uses: actions/cache@v3
                with:
                    path: $-build-cache-$
                    key: $-build-$
                    restore-keys: |
                        $-build-

            # This step builds the service
            # It uses a different action depending on the service
            - name: Build (service: $)
                uses: $

            # This step tests the service
            # It uses a different action depending on the service
            # The maven/maven-action@v3 action is used for the frontend service
            # The jest/jest-action@v3 action is used for the backend service
            # The liquibase/liquibase-action@update action is used for the database service
            - name: Test (service: $)
                uses: $
                    with:
                        # Define test arguments specific to each service (optional)
                        # For example: arguments: 'test'

Note:

  • The matrix strategy is used to run the job for three different services (frontend, backend, and database) in parallel.
  • The actions/cache action is used to cache dependencies to avoid re-downloading on each build, which can significantly speed up your builds.
  • The uses keyword is used to select different actions depending on the service being built and tested.
  • The with keyword is used to define test arguments specific to each service (optional).
  • The matrix strategy allows you to run the job multiple times with different configurations. This means that each service (frontend, backend, database) will be built and tested in parallel on separate runners in parallel.
  • Explore the GitHub Actions Marketplace for a wider range of actions that cater to various technologies and testing frameworks. The actions used in this example are placeholders and may need to be replaced with actions that match your project's requirements.
Example 4: Parallel Matrix Strategy for Multi-Platform Testing
name: CI - Multi-Platform Testing (Parallel)

# This workflow is triggered every time code is pushed to the main branch
on:
    push:
        branches: [ main ]
    
jobs:
    test:
        # This job runs on the latest version of Ubuntu, macOS, and Windows
        runs-on: $
        strategy:
            # The matrix strategy allows you to run the job multiple times with different configurations. This means that the job will be run on three different operating systems in parallel.
            matrix:
                os: [ ubuntu-latest, macos-latest, windows-latest ]
        steps:
            # This step checks out your repository so your job can access it
            - uses: actions/checkout@v4

            # This step installs Node.js and npm
            - name: Setup Node.js
                uses: actions/setup-node@v3
                with:
                    node-version: 14

            # This step installs dependencies
            - name: Install dependencies
                run: npm install

            # This step runs tests
            - name: Run tests
                run: npm test
Illustration - Parallel Testing
stateDiagram
    state Continuous-Integration {
        state Parallel-Testing: Run independent tests concurrently for faster feedback {
            UnitTests --> UnitTest1: Each test runs in \na separate environment
            UnitTests --> UnitTest2: Tests are independent \nof each other
            UnitTests --> UnitTest3: 
            UnitTests --> UnitTest4: 
            UnitTests --> UnitTest..N:  
        }
    }

Manual Approvals

GitHub Actions supports manual triggers and approvals through workflow_dispatch and environment protection rules, allowing you to automate these steps and streamline your pipeline. By defining manual approval steps in your workflows, you can ensure that critical changes are reviewed and approved before deployment, without introducing unnecessary delays. This can help maintain a balance between automation and human oversight in your CI/CD process.

In generat, Continuous Deployment (CD) is the practice of automatically deploying every change that passes through the CI pipeline to production. Contunous Deployment is a practice where every change that passes through the CI pipeline is automatically deployed to production. Some organizations in modern day deployment practices have adopted Continuous Deployment (CD) as a way to streamline the deployment process and reduce manual intervention. However, this requires a high level of automation, testing, and confidence in the CI/CD pipeline to ensure that changes are safe to deploy to production.

However, in some cases, manual approvals are necessary to ensure that critical changes are reviewed and approved before deployment and rquire manual triggers.

Example script: Manual Approval for Deployment
on:
    workflow_dispatch:
        inputs:
            environment:
                description: 'Environment to deploy to'
                required: true
                default: 'staging'
jobs:
    deploy:
        runs-on: ubuntu-latest
        environment: $
        steps:
        - name: Checkout code
            uses: actions/checkout@v4
        - name: Deploy
            run: make deploy

Limited Resources

GitHub Actions provides various types of runners with different resource configurations to suit your needs, including support for self-hosted runners or GitHub-Hosted larger runners for more resource-intensive workflows.

Example script: Using a Self-Hosted Runner
jobs:
    build:
        runs-on: self-hosted
        steps:
        - name: Checkout code
            uses: actions/checkout@v4
        - name: Build
            run: make build
Example script: Using a GitHub-Hosted large runner - 16 core
jobs:
  check-bats-version:
    runs-on:
      labels: ubuntu-20.04-16core
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '14'
      - run: npm install -g bats
      - run: bats -v

External Dependencies

GitHub Actions can handle retries and timeouts for better reliability, and you can use third-party actions from the GitHub Marketplace to mock or stub out external services for more reliable testing.

Example script 1: Retrying External API Calls
steps:
- name: Call external API
    uses: nick-invision/retry@v2
    with:
        timeout_minutes: 10
        max_attempts: 3
        command: curl https://api.example.com

Note:

  • The nick-invision/retry action is used to retry the curl command up to 3 times with a timeout of 10 minutes.
  • This approach can help handle intermittent failures or network issues when calling external APIs.
Example script 2: Mocking External Services
steps:
- name: Checkout code
    uses: actions/checkout@v4
- name: Setup Mock Server
    uses: mock-server-action@v1
    with:
        port: 3000
        routes: 'mocks/*.json'
- name: Run Tests
    run: npm test

Note:

  • The mock-server-action is used to set up a mock server on port 3000 with routes defined in the mocks/*.json files.
  • This approach allows you to mock external services during testing to ensure reliable and consistent results.

Improper Configuration Management

GitHub Actions integrates well with most Infrastructure as Code (IaC) tools, allowing you to manage your environments and configurations as code and ensure consistency across your pipeline.

Example script: Applying Terraform Configuration
steps:
- name: Checkout code
    uses: actions/checkout@v4
- name: Setup Terraform
    uses: hashicorp/setup-terraform@v1
- name: Apply Terraform configuration
    run: terraform apply

Lack of Monitoring

GitHub Actions provides comprehensive logs for each run, and integrates with many monitoring tools, allowing you to track the health and performance of your pipeline in real time.

Example script: Sending Event to Datadog
steps:
- name: Checkout code
    uses: actions/checkout@v4
- name: Setup Datadog
    uses: DataDog/github-action-datadog@v1
    with:
        api-key: $
        app-key: $
- name: Send event to Datadog
    run: |
        curl -X POST "https://api.datadoghq.com/api/v1/events" \
        -H "Content-Type: application/json" \
        -H "DD-API-KEY: $" \
        -H "DD-APPLICATION-KEY: $" \
        -d '{
                "title": "Workflow run completed",
                "text": "The workflow run for $ has completed.",
                "priority": "normal",
                "tags": ["workflow:$", "run:$"],
                "alert_type": "info"
        }'
Example script: Sending Event to Splunk
steps:
- name: Checkout code
    uses: actions/checkout@v4
- name: Setup Splunk
    uses: splunk/splunk-connect-action@v1
    with:
        splunk-url: $
        splunk-token: $
- name: Send event to Splunk
    run: |
        curl -X POST $ \
        -H "Authorization: Splunk $" \
        -H "Content-Type: application/json" \
        -d '{
                "event": {
                    "message": "Workflow run completed",
                    "workflow": "$",
                    "run_id": "$"
                }
        }'

Shifting Left with Security Scans

GitHub Actions integrates with various security scanning tools, allowing you to shift security checks left in your pipeline and identify vulnerabilities early in the development process.

CodeQL is a powerful static analysis tool that can help you find security vulnerabilities, bugs, and other issues in your code. By integrating CodeQL scans into your CI/CD pipeline, you can identify and fix security issues early in the development process, for example dusring every day code review on pull requests. This can help improve the overall security of your application and reduce the risk of security breaches. This can help mitigate security risks discovered during production and reduce the cost of fixing them. In matters of pipeline optimization, it can help reduce the time spent on manual security reviews and improve the efficiency of your development process.

Example script: Running CodeQL Analysis
steps:
- name: Checkout code
    uses: actions/checkout@v4

- name: Setup CodeQL
    uses: github/codeql-action/init@v3
    with:
        languages: java

- name: Autobuild
    uses: github/codeql-action/autobuild@v3

- name: Run CodeQL analysis
    uses: github/codeql-action/analyze@v3

Large Deployments

GitHub Actions supports artifact uploads and downloads, allowing you to store and retrieve large files efficiently. You can also use actions from the GitHub Marketplace to optimize your artifacts and reduce their size.

Example script: Archiving Production Artifacts
steps:
- name: Checkout code
    uses: actions/checkout@v4
- name: Build
    run: make build
- name: Archive production artifacts
    uses: actions/upload-artifact@v2
    with:
        name: dist
        path: dist/

Improper Version Control

Improper version control practices can lead to conflicts, errors, and instability in the pipeline and collaboration process within the team. Adopting proper version control practices is an essential part of a DevOps culture and can significantly improve the stability and efficiency of your pipeline. For example using branches and branching strategies, pull requests, and code reviews can help maintain a clean and efficient version control workflow. Here is a GitHub Flow example. Additionally, practices like tagging releases, using semantic versioning, and automating versioning can help streamline the release process and ensure consistency across different environments.

Taking advantage of the GitHub Actions Marketplace

Leveraging GitHub Actions and the GitHub Actions Marketplace, you can access a wide range of actions and workflows to optimize your CI/CD pipeline. These actions can help you automate tasks, improve efficiency, and streamline your development process. By exploring the GitHub Actions Marketplace, you can find actions that cater to your specific needs and integrate them into your workflows to enhance your pipeline. The Actions in the Marketplace are created by the community and cover a wide range of use cases, from building and testing to deployment and monitoring. By utilizing these actions, you can save time, reduce manual intervention, and improve the overall efficiency of your CI/CD pipeline.

Reusing Workflows

Reusing workflows across different projects can help standardize your CI/CD process and ensure consistency. By creating reusable workflows that can be shared and reused across multiple repositories, you can streamline your development process and maintain a consistent pipeline. This approach can save time, reduce duplication of effort, and help enforce best practices across your organization.

GitHub Actions and GitHub Copilot: Your Partners in Optimization

GitHub Copilot can also be a valuable tool in this optimization process. It can generate code for common tasks, provide code suggestions as you write your workflows, serve as a learning tool if you're new to GitHub Actions, and generate comments and documentation for your code.

Start exploring GitHub Actions and GitHub Copilot today to supercharge your CI/CD pipeline and take your software delivery to the next level.

Visualizing an Optimized CI/CD Pipeline

Developer Workflow - PR, Code Review, Repository Rules

The developer workflow is the first step in our CI/CD pipeline. Developers commit new changes in a pull request, which triggers the rest of the pipeline. Code reviews and repository rules play a crucial role in maintaining code quality at this stage.

GitHub Copilot can assist developers by providing code suggestions and generating code snippets, making the coding process more efficient. By leveraging these tools, developers can ensure that their code adheres to best practices and is ready for the next stages of the pipeline.

stateDiagram
    state Developer-Workflow {
    Commits --> PR: Developers Commit new changes in a Pull Request
    PR --> CodeReview: Code Review
    PR --> RepositoryRules: Enforce clean and efficient version control workflow
    }

Continuous Integration - CI

Continuous Integration (CI) is the next stage in our pipeline. At this stage, the code is built and tested. Efficient optimization of build and test workflows is crucial for maintaining a fast and reliable CI process.

Fast feedback loops allow developers to quickly identify and fix issues, improving code quality and reducing the time to delivery. "Shifting left" with security scans, such as CodeQL analysis, ensures that security is considered early in the development process.

Caching dependencies and parallelizing tests can significantly reduce build times, making the CI process more efficient. By leveraging GitHub Actions and the GitHub Actions Marketplace, you can further optimize your CI process.

stateDiagram
    state Continuous-Integration {
        state Security-Scans {
        Build --> App: CodeQL Analysis
        Build --> Database: Databse code scanning
        Build --> Package: Compile
        }
        Build --> JunitTests: Storing Artifacts
        state Parallel-Testing {
        JunitTests --> JunitTest1: Each test runs in \na containerized environment
        JunitTests --> JunitTest2
        JunitTests --> JunitTest3
        JunitTests --> JunitTest4
        JunitTests --> JunitTest..N
        }
        JunitTests --> Publish: If CI passes, \nmerging to main branch \nand publishing Containerised\n App to GitHub\n Container Registry
    }

Continuous Delivery/Deployment - CD

The final stage in our pipeline is Continuous Delivery/Deployment (CD). At this stage, the code is deployed to production.

Optimization and approval gates ensure that only high-quality code is deployed. Different environments, such as staging and production, allow for thorough testing before deployment.

In our example, we use a blue-green deployment strategy, which reduces downtime and risk. Secure credentials, such as OpenID Connect, are crucial for maintaining security during the deployment process.

By reusing workflows, such as a standardized deployment workflow, you can ensure a consistent and efficient CD process. GitHub Actions and the GitHub Actions Marketplace offer a wide range of reusable workflows that can fit your organization's needs.

stateDiagram
    state Continuous-Delivery {
    Publish --> SystemTesting: Pulling Image from GHCR
    SystemTesting --> IntegrationTesting: [staging]
    IntegrationTesting --> AccepetanceTesting: [staging]
    }
    AccepetanceTesting --> Deploy: Login with OpenID Connect and \nDeploy the app to K8s
    Deploy --> [ProdInstance1]: Blue
    Deploy --> [ProdInstance2]: Green

Putting it all together

Here's a flow chart that visualizes an optimized CI/CD pipeline using GitHub Actions:

In this diagram:

  • Developers commit new changes in a Pull Request, which triggers the build and unit test suite.
  • The Continuous Integration phase includes security scans, parallel testing, and artifact storage.
  • If the CI passes, the changes are merged to the main branch and the containerized app is published to the GitHub Container Registry.
  • The Continuous Delivery phase involves system testing, integration testing, and acceptance testing before deploying the app to Kubernetes.
  • The app is deployed to multiple production instances using a blue-green deployment strategy.

stateDiagram
    state Developer-Workflow {
    Commits --> PR: Developers Commit new changes in a Pull Request
    PR --> Build: Build & Unit Test Suite
    }
    
    state Continuous-Integration {
        state Security-Scans {
        Build --> App: CodeQL Analysis
        Build --> Database: Databse code scanning
        Build --> Package: Compile
        }
        Build --> JunitTests: Storing Artifacts
        state Parallel-Testing {
        JunitTests --> JunitTest1: Each test runs in \na containerized environment
        JunitTests --> JunitTest2
        JunitTests --> JunitTest3
        JunitTests --> JunitTest4
        JunitTests --> JunitTest..N
        }
        JunitTests --> Publish: If CI passes, \nmerging to main branch \nand publishing Containerised\n App to GitHub\n Container Registry
    }

    state Continuous-Delivery {
    Publish --> SystemTesting: Pulling Image from GHCR
    SystemTesting --> IntegrationTesting: [staging]
    IntegrationTesting --> AccepetanceTesting: [staging]
    }
    AccepetanceTesting --> Deploy: Login with OpenID Connect and \nDeploy the app to K8s
    Deploy --> [ProdInstance1]: Blue
    Deploy --> [ProdInstance2]: Green