Arena Allocation in Go

One of the weaknesses of Go’s runtime today is the relatively naive GC implementation. This is evident from go performing consistently worse than most other languages in the binary trees benchmark. However, the language can make designing programs that reduce GC cost fairly straightforward.

On my laptop, the Go program runs in 32.32s and consumes 348.3 MB of resident memory. In comparison, the Haskell version (with “+RTS -N4 -K128M -H -RTS”) runs in 16.06s and consumes 772.6MB!

The code used in the benchmark creates and discards several binary trees by allocating each node individually. A small change in the code to allocate several nodes whenever allocation is required, instead of just one, can significantly reduce the amount of work performed by GC. This technique is called arena allocation.

The NodeArena type’s Get method returns a pointer to an element in the slice and allocates in chunks of 10,000 when the slice becomes empty.

type NodeArena []Node

func (na *NodeArena) Get() *Node {
    if len(*na) == 0 {
        *na = make([]Node, 10000)

    n := &(*na)[len(*na)-1]
    *na = (*na)[:len(*na)-1]
    return n

The bottomUpTree function that creates the trees takes a pointer to NodeArena and uses it to create the nodes for the trees.

func bottomUpTree(item, depth int, arena *NodeArena) *Node {
    n := arena.Get()
    if depth <= 0 {
        *n = Node{item, nil, nil}
    } else {
        *n = Node{
            bottomUpTree(2*item-1, depth-1, arena),
            bottomUpTree(2*item, depth-1, arena),
    return n

With arena allocation, the Go code runs in 15.73s and consumes 268.9 MB of memory. This fairly trivial change reduced the running time by 51% and memory consumption by 22.8%!

For the curious, if I give the Go GC as much memory as the Haskell version by setting GOGC=425, it runs in under 9s!

Note: All measurements are using go 1.3 and ghc 7.8.2, with the same build and runtime options as in the benchmarks game.

07 Jul, 2014 golang / allocation / arena