Performance
Performance optimization in Go requires that you understand its garbage collector, and the distinctions between stack and heap memory allocations. You need to optimize code to reduce memory usage and minimize garbage collection, which increases the scalability and responsiveness of Go applications.
Garbage collection
Blog post about Go’s garbage collector
Garbage collection prevents memory links, dangling pointers, and double frees. Go’s garbage collector (GC) handles memory inference—which memory to free—by tracking allocations on the heap, freeing unneeded allocations, and keeping allocations in use. The GC does this with tracing and reference counting.
Tracing and reference counting
Go’s tracing GC traces objects that are reachable by a chain of references from root objects, and it considers all other objects as garbage. Roots are objects that are always considered “alive” because the program can access them directly:
- Global variables
- Local variables on goroutine stacks
- Function arguments
- CPU registers
- Runtime-internal references
GC algorithm
Go uses a concurrent, tri-color mark-and-sweep algorithm. Concurrent means it runs alongside your code and doesn’t stop the runtime to clean up garbage. Tri-color is how the GC views objects for cleanup. Mark and sweep is split into two parts:
1: Mark
This phase runs concurrently with the program execution. The first part of the phase is stop-the-world (STW), where the program stops and Go identifies the root set to figure out what is in use.
Next, variables are labeled by color to determine whether they are garbage collected:
- white: All variables are initially marked white, which means there is no decision made.
- gray: Variables that are identified by their roots, but need to be explored further.
- black: Variables that are in use.
During marking, the Go runtime allocates about 25% of CPU resources so that the GC is efficient enough to keep memory usage in check without overwhelming the system. It also allocates about 5% of CPU for mark assists, where goroutines help with marking if the GC is lagging due to memory allocations during the GC cycle.
2: Sweep
Objects still marked white are garbage collected, and their memory is deallocated. This helps avoid memory leaks, but it can cause latency spikes on apps with large heaps.
Stack
The stack stores variables that are tied to the function calls that create them:
- Local variables
- Function parameters
- Return values
The stack is efficient because it uses Last In, First Out (LIFO), and it pops objects off the stack using a pointer. The stack is small, and putting too much on the stack results in stack overflow.
Heap
The heap stores variables that are less predictable and not explicitly tied to where they were created. These variables must live beyond the scope of the function they were created in. The heap is more flexible and dynamic, but this means it is slower in allocating memory. The GC manages heap cleanup, which adds overhead and reduces performance.
GC Tooling
GOGC
The GOGC environment variable is how you tune the GC. It accepts any integer greater than 0. By default, it is set to 100. This means that the GC tries to leave at least 100% of the initial heap memory available after a new GC cycle. If you set it to 50, the GC runs more frequently and keeps the heap size smaller, which uses more CPU time. Setting it higher—for example, to 200—means the GC runs less frequently, leading to apps that use large amounts of memory. You can also set it to off for programs that do not need the GC.
GC pacer
The GC pacer regulates the timing of the GC cycles to balance the need to reclaim memory with the need to keep the program running efficiently. It decides when to run the GC based on the heap size and the allocation rate. It uses the GOGC setting as a guideline to determine thresholds that trigger a GC cycle.
GC is adaptive—if an app consumes lots of memory, then it will run the GC more often.
GODEBUG
GODEBUG is an environment variable that gives insight into the Go runtime. Use this when you want to find memory leaks or optimize the GC overhead.
The following setting is often used to gain detailed information about garbage collection processes. It outputs detailed information for each GC cycle so you can understand the GC’s impact on your application performance:
GODEBUG=gctrace=1 go run .
gc 1 @0.004s 2%: 0.058+0.49+0.035 ms clock, 0.70+0.29/0.64/0+0.42 ms cpu, 3->4->0 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 12 P
gc 1: Sequence number of the GC@0.004s: Time (seconds) since program started to when the GC ran2%:: Percentage of total program runtime spent on GC.0.058+0.49+0.035 ms clock: Breakdown of GC cycle time:0.058: STW sweep mark start+0.49: Concurrent mark and scan phase time+0.035 ms clock: STW mark end
0.70: CPU time for sweep mark start+0.29/0.64/0: CPU time for concurrent phases+0.42 ms: CPU time for STW mark end3->4->0 MB: Heap size at start, midpoint, and end of GC cycle4 MB goal: Next GC cycle target heap size0 MB stacks: Amount of memory reachable from stacks accounts for this amount of the heap.0 MB globals: Amount of memory reachable from global vars accounts for this amount of the heap.12 P: Number of processors
Memory ballast
Deprecated
Memory ballasts are relevant only for Go version 1.19 and earlier. Newer versions manage heap size with the GOMEMLIMIT environment variable.
A memory balast is a large allocation of memory that is used only to influence the behavior of the GC. The GC runs when based on the heap size. If the heap is double what it was when the last GC cycle ended, then it begins. A memory ballast artificially increases the heap size, which increases the GC threshold and causes the GC to run less often. Here is an example implementation:
var ballast []byte
func init() {
ballast = make([]byte, 10<<30) // 10 GB (example)
}
GOMEMLIMIT
GOMEMLIMIT is a soft cap on the memory usage of the Go runtime. It gives your application a memory budget and removes the need for manual tweaks like memory ballasts. It is a numeric value measured in bytes. You can add a unit suffix for clarity, such as B, KiB, MiB, or GiB. See IEC 80000-13 for more suffixes.
Performance analysis
Escape analysis
Escape analysis is how Go’s compiler decides whether a variable goes to the stack or the heap:
- If the lifetime of the variable doesn’t “escape” the function that its in, it goes to the stack. For example, does not have a return value, any variables within the function do not escape, and they are stored on the stack. If it returns a pointer to an integer, that pointer does escape the function.
- If the variable is passed around or returned from a function, then it “escapes” to the heap.
The goal is to improve memory usage by allocating variables on the stack when possible because the stack is faster and more CPU-cache friendly.
In Go, no goroutine can have a pointer to another goroutine’s stack. This influences where the compiler stores variables. Here is a sample program and its escape analysis:
Inlining functions
//go:noinline is a compiler directive that tells the Go compiler to always create a function call with a stack frame for the function. Otherwise, the compiler might replace simple functions with the actual code of the function to enhance performance, which can change the stack frames. //go:noinline is helpful for benchmarking and escape analysis.
type person struct {
name string
age int
}
func main() {
p := createPerson
fmt.Println(p)
}
//go:noinline
func createPerson() *person {
p := person{
name: "Steve Stevens",
age: 100,
}
return &p
}
To view how Go performs escape analysis, use the -gcflags "-m -m" option. Ignore the lines about inlining and cost:
- Go assigns each function a complexity cost to determine whether inlining is beneficial. Here, it is not beneficial to inline
main. pis allocated on the heap
go run -gcflags "-m -m" .
# perf
./main.go:16:6: cannot inline createPerson: marked go:noinline
./main.go:10:6: cannot inline main: function too complex: cost 83 exceeds budget 80 # 1
./main.go:12:13: inlining call to fmt.Println
./main.go:17:2: p escapes to heap: # 2
./main.go:17:2: flow: ~r0 = &p:
./main.go:17:2: from &p (address-of) at ./main.go:21:9
./main.go:17:2: from return &p (return) at ./main.go:21:2
./main.go:17:2: moved to heap: p
./main.go:12:13: ... argument does not escape
0x491b20
Pointers
A pointer is a variable that holds the address of another variable. They “point” to where the value lives, whether that is on the stack, heap, memory, etc. Pointers let you change the value of a variable without passing around the variable itself.
Declaring and getting the value from a pointer requires that you understand the following syntax:
- Declare a pointer with an asterisk (
*) before the type. - Store the address of a variable with the address-of operator (
&). - Get the value that a pointer points to by dereferncing the pointer. Use the asterisk (
*) to dereference. This is like saying, “Follow the pointer to the address and get the value.”
var pointer *int // 1
x := 10
pointer = &x // 2
fmt.Println(*pointer) // 3
Best practices
Follow these best practices when using pointers:
- Null pointer checks: Check whether a pointer is
nilbefore dereferencing. This avoids runtime panics. - Passing pointers: Use pointers when passing large structs to functions to avoid copying the entire structure.
Stack
The stack is where all local variables live until their function calls end. The stack is a LIFO data structure, where the data for a function is placed on the stack in the order it is , then popped off the stack from top to bottom and cleared when the function returns. Go has a limited stack size, but it is also dynamic and can resize as needed.
When a function is called, Go allocates space on the stack for its local variables. Each function call creates a stack frame that contains all the necessary information for the function, which includes the local variables, arguments, and return address.
Heap
The heap is a much larger, more flexible space to store data. It is a less structured area of memory, which means it is slower than the stack.
Pointers, stack, and heap
To illustrate, create a simple program that uses a linked list. The code consists of the following:
Node: Struct that models each node in the listbuildList: Function that creates a three-node list, and returns a pointer to the first node in the list.main: Entrypoint that builds the list and prints all its values.
The storage location for each value is determined by how it is called. To summarize, the pointer to the head of the linked list lives on the stack, the nodes live on the heap, and the pointers within the nodes live on the heap:
Nodeis a type declaration that lives in the program’s compiled metadata. It does not live on either the stack or the heap.maincallsbuildList, a function that returns a pointer to aNodeinstance.headis stored as a local variable in the stack, onmain’s stack frame.buildListallocates all variables on the heap, which allows it to grow and shrink as needed. Its return value is an indicator:return &n1escapes the function and survives afterbuildListreturns. That means thatn2andn3must also be allocated on the heap.- Because the
n*variables are on the heap,n*.nextvalues are stored on the heap because thats where they are created.n*.nextis a pointer in a heap object that points to another heap object.
type Node struct { // 1
value int
next *Node
}
func main() {
head := buildList() // 2
fmt.Println(
head.value,
head.next.value,
head.next.next.value
)
}
func buildList() *Node {
n1 := Node{value: 1} // 3.1
n2 := Node{value: 2} // 3.1
n3 := Node{value: 3} // 3.1
n1.next = &n2 // 3.2
n2.next = &n3
return &n1 // 3.1
}