Benchmarking Go Code

Sun, Jun 26, 2022 7-minute read

If you have been programming in Go for some time, you probably really like the tooling that comes with it. The language itself is packed to the brim with useful tooling such as go test, go fmt, go build, etc. We have gone over testing before. Today, we will be looking at how we can benchmark our Go code.

Why you should run benchmarks

Ever wondered how fast and efficient your code runs? Benchmarks are the most straightforward way to check the performance of your code. The benchmark tool provided by Go focuses on these aspects:

Time taken per operation
Memory allocated per operation

Benchmarks aren’t a must, but they can be a good way to check for bottlenecks. Also, running numbers is fun. If you are a content creator, you probably check your feed daily, keeping track of how many new views your recent posts got. Benchmarks give me a similar sense of joy.

Insertion Sort vs. Merge Sort

We will use the following code for this tutorial.

package main

import (
    "fmt"
)

func main() {
    list := []int{3, 4, 1, 5, 2}
    iList := insertionSort(list)
    mList := mergeSort(list)
    fmt.Println(iList, mList)
}

func insertionSort(list []int) []int {
    for i, num := range list {
        j := i - 1

        for j >= 0 && num < list[j] {
            list[j+1] = list[j]
            j -= 1
        }
        list[j+1] = num
    }

    return list
}

func mergeSort(list []int) []int {
    if len(list) > 1 {
        mid := len(list) / 2
        left := list[:mid]
        right := list[mid:]

        mergeSort(left)
        mergeSort(right)

        i, j, k := 0, 0, 0

        for i < len(left) && j < len(right) {
            if left[i] < right[j] {
                list[k] = left[i]
                i++
            } else {
                list[k] = right[j]
                j++
            }
            k++
        }

        for i < len(left) {
            list[k] = left[i]
            i++
            k++
        }

        for j < len(right) {
            list[k] = right[j]
            j++
            k++
        }
    }

    return list
}

The above is an implementation of two popular sorting algorithms: insertion sort and merge sort.

An insertion sort is a simple sorting algorithm that works like this:

Iterate through the list from the beginning.
If the current number is smaller than the previous number, move it to the front until it becomes bigger than the previous number.

Merge sort takes a different approach:

Divide the list in half, so that you end up with a left list and a right list. Do this recursively on the left list and the right list.
When you can’t divide anymore, merge the left list and right list in ascending order.
Merge the lists until you end up with the sorted original list.

These diagrams will help you understand the two sorts better.

Insertion Sort:

https://media.geeksforgeeks.org/wp-content/uploads/insertion_sort-recursion.png

Merge Sort:

https://en.wikipedia.org/wiki/Merge_sort#/media/File:Merge_sort_algorithm_diagram.svg

Benchmarking the code

Let’s write our benchmark code.

package main

import (
    "fmt"
    "math/rand"
    "testing"
)

func BenchmarkInsertionSort(b *testing.B) {
    inputSize := []int{10, 100, 1000, 10000, 100000}
    for _, size := range inputSize {
        b.Run(fmt.Sprintf("input_size_%d", size), func(b *testing.B) {
            testList := make([]int, size)
            for i := 0; i < size; i++ {
                testList[i] = rand.Intn(size)
            }
            b.ResetTimer()

            for i := 0; i < b.N; i++ {
                insertionSort(testList)
            }
        })
    }
}

func BenchmarkMergeSort(b *testing.B) {
    inputSize := []int{10, 100, 1000, 10000, 100000}
    for _, size := range inputSize {
        b.Run(fmt.Sprintf("input_size_%d", size), func(b *testing.B) {
            testList := make([]int, size)
            for i := 0; i < size; i++ {
                testList[i] = rand.Intn(size)
            }
            b.ResetTimer()

            for i := 0; i < b.N; i++ {
                mergeSort(testList)
            }
        })
    }
}

All benchmark codes are named in the format of BenchnmarkXxx, where Xxx is the name of the function being benchmarked starting with a capital letter.

We are going to test the two sorting algorithms using different input sizes. These are defined inside inputSize. For each input size, we launch a b.Run which fires off a single benchmark. b.Run accepts two parameters: first is the name of the benchmark and the second is the actual benchmark function to run. fmt.Sprintf("input_size_%d", size) is a good way to dynamically name our benchmarks, as it will name our benchmarks like so: input_size_10, input_size_100, and so on.

You could make one benchmark function for each input size, but the code will become unnecessarily long, so using b.Run is the idiomatic approach.

testList := make([]int, size)
for i := 0; i < size; i++ {
    testList[i] = rand.Intn(size)
}
b.ResetTimer()

This portion populates our test data. We create a slice of size size, and populate it with random integers. We call b.ResetTimer in the end to reset our benchmark timer. b.Run starts the timer at the time of launching it, so it will by default include the time taken for initializing data. This is misleading, and we reset the timer before running the benchmark.

for i := 0; i < b.N; i++ {
    mergeSort(testList)
}

This is the loop that runs our mergeSort function. The loop runs b.N time, which is an arbitrary number calculated to yield stable and reliable results. It’s like running a lab multiple times to find the average data of different data points to make the results more precise.

Let’s run the benchmark now.

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: example.com/benchmarking
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkInsertionSort/input_size_10-8          127217533                9.369 ns/op           0 B/op          0 allocs/op
BenchmarkInsertionSort/input_size_100-8         13812344                89.95 ns/op            0 B/op          0 allocs/op
BenchmarkInsertionSort/input_size_1000-8         1508656               743.6 ns/op             0 B/op          0 allocs/op
BenchmarkInsertionSort/input_size_10000-8         158587              7365 ns/op               0 B/op          0 allocs/op
BenchmarkInsertionSort/input_size_100000-8             1        1296239500 ns/op               0 B/op          0 allocs/op
BenchmarkMergeSort/input_size_10-8              14895802                76.45 ns/op            0 B/op          0 allocs/op
BenchmarkMergeSort/input_size_100-8              1000000              1045 ns/op               0 B/op          0 allocs/op
BenchmarkMergeSort/input_size_1000-8               89268             14178 ns/op               0 B/op          0 allocs/op
BenchmarkMergeSort/input_size_10000-8               7522            156233 ns/op               0 B/op          0 allocs/op
BenchmarkMergeSort/input_size_100000-8               678           1743882 ns/op               0 B/op          0 allocs/op
PASS
ok      example.com/benchmarking        15.215s

We run go test like how we would run tests. Then we pass these two flags:

-bench specifies which benchmark function to run. To run all of them, pass -bench=..
-benchmem shows you how much memory is allocated per operation, and how many allocations occur.

To the left, you can see the names of the benchmark functions. The -8 at the end means that we are using 8 cores of our CPU.

Next to that is a huge number that decreases as our input size increases. These are your b.N, or how many times the benchmark has run your function. We call this the number of operations.

To the right of that, you can see the average time taken per operation. This means that for every run of the function, approximately x nanoseconds passed.

The last two columns show the average amount of memory bytes used for each operation, and the number of allocations per operation.

You can see that for smaller input sizes of 10,000 and below, the insertion sort is more efficient. However, from that point onwards, merge sort is much more efficient.

Both are memory efficient because they both do not require much allocation of memory. In Go, taking slices doesn’t cost memory. More on that [here](pointers - Slices in golang do not allocate any memory? - Stack Overflow).

Conclusion

Benchmarks are a useful tool to check if your code is running efficiently. This leaves us with one question: this means that we don’t need to benchmark our code if we write a well-optimized code from the get-go, right? In a perfect world, this is true. However, I’d advise against it, as countless other people have.

Premature optimization is the root of all evil.

–Donald Knuth, the legendary computer scientist, and mathematician

Focus on writing readable, working code. Then run a benchmark if you start running into bottlenecks. Maintainability is more important than performance in many cases, especially when modern computers are so fast.

Thank you for reading! You can also read this post on Medium and Dev.to.