Go scheduler and preemption on tomleb's blog
I recently started doing a deep dive into the go internals, more specifically the scheduler. One of the first thing I came across is that go uses a cooperative work-stealing scheduler.
The cooperative part means that the program cooperates with the runtime to schedule or deschedule goroutines. I don’t have the expertise to go over the pros and cons of it, but one disadvantage with a fully cooperative language is that long computation can prevent other goroutines to be switched until finished. This was reported as Issue #11462 with the following code snippet taking a lot more than 100ms to print.
package main
import (
"fmt"
"runtime"
"time"
)
//go:noinline
func cpuIntensive(p *int) {
for i := 1; i <= 1000000000000000000; i++ {
*p = i
}
}
func main() {
runtime.GOMAXPROCS(1)
x := 0
go cpuIntensive(&x)
time.Sleep(100 * time.Millisecond)
// printed only after cpuIntensive is completely finished
fmt.Printf("x = %d.\n", x)
}
This program first limits the number of process to 1, and then starts a long computation goroutine. According to the issue, due to being cooperative, the code is supposed to finish the computation before printing even though we’re creating a goroutine. However, when I tried running the code (with version 1.16), I got output before the end of the computation. How come?
Turns out asynchronous preemption was added to go recently1. Preemption means that a running goroutine can be interrupted (similar to how OS threads are interrupted by the OS) if it is safe to do so.
There are two interesting questions I wanted to find the answer for:
-
How does this asynchronous preemption works?
-
How does the runtime determine if a running goroutine can be preempted at its current instruction?
Let’s dig in!
How does asynchronous preemption works ?
Go uses a signal-based preemption model. It configures a signal handler for the
SIGURG
signal, which was chosen for various reasons. Upon receiving this
signal, the goroutine will stop execution, leaving its place to the runtime (we
don’t consider this being preempted in this context). The runtime will then
decide if it is safe to suspend the goroutine to run another one. Alright, but
who sends the SIGURG
signal? Answer: sysmon.
At startup, the runtime creates a System Monitor (sysmon) thread. According
to Go: sysmon, Runtime Monitoring, the sysmon thread is not taken into account
by the runtime, which more importantly means that it is always running
(concurrently with other OS threads). As you can see from the simplified code
below, func sysmon() is basically an endless loop that sleeps and attempts to
preempt goroutines (among other things) by sending the SIGURG
signal.
func sysmon() {
idle := 0 // how many cycles in succession we had not wokeup somebody
delay := uint32(0)
...
for {
...
usleep(delay)
...
now = nanotime()
...
// retake P's blocked in syscalls
// and preempt long running G's
if retake(now) != 0 {
idle = 0
} else {
idle++
}
...
}
}
Here’s what the simplified retake() function looks like. It preempts goroutines every 10ms.
// forcePreemptNS is the time slice given to a G before it is
// preempted.
const forcePreemptNS = 10 * 1000 * 1000 // 10ms
func retake(now int64) uint32 {
n := 0
...
for i := 0; i < len(allp); i++ {
_p_ := allp[i]
if _p_ == nil {
// This can happen if procresize has grown
// allp but not yet created new Ps.
continue
}
pd := &_p_.sysmontick
s := _p_.status
sysretake := false
if s == _Prunning || s == _Psyscall {
// Preempt G if it's running for too long.
t := int64(_p_.schedtick)
if int64(pd.schedtick) != t {
pd.schedtick = uint32(t)
pd.schedwhen = now
} else if pd.schedwhen+forcePreemptNS <= now {
preemptone(_p_)
// In case of syscall, preemptone() doesn't
// work, because there is no M wired to P.
sysretake = true
}
}
...
}
...
}
For sake of completeness, if you follow the preemptone()
source code, you will
end up in preemptM() in signal_unix.go.
const sigPreempt = _SIGURG
...
func preemptM(mp *m) {
...
signalM(mp, sigPreempt)
...
}
The following diagram illustrates the difference between go with and without async preemption. Without sysmon sending the signal, the goroutine with a long computation is never preempted and must finish before another one can run. However, with sysmon sending the signal, then the goroutine can be preempted and give CPU time for another goroutine. This is simplified and skips details but it gives an idea.
Now that we have an idea of how, let’s find out in which circumstances the runtime is allowed to preempt a goroutine.
Determining preemptability of current function
Having no experience with compiler design, the fact that the runtime is able to determine the preemptability of a goroutine at the PC (program counter) level is fascinating to me.
At compile time, go adds some metadata into a table structure called the
PCDATA
table. More specifically, the PCDATA_UnsafePointer
table. You can
find more information about the internal design of this table at Go 1.2 Runtime
Symbol Information, although I’m not sure how up to date it is.
When the SIGURG
signal is received, the runtime will look at this table to
verify whether the goroutine can be preempted and also what the PC will be once
the goroutine will be rescheduled (the resumption PC). The following flowchart
shows the logic to determine preemptability. This is defined in the
isAsyncSafePoint() function.
If we go back to the initial program shown in this post, we can easily force the
old behavior by making the go compiler mark cpuIntensive()
as unsafe. One way
to mark it as unsafe is to make it a nosplit function with a compiler
directive. The code then becomes:
package main
import (
"fmt"
"runtime"
"time"
)
//go:noinline
//go:nosplit
func cpuIntensive(p *int) {
for i := 1; i <= 1000000000000000000; i++ {
*p = i
}
}
func main() {
runtime.GOMAXPROCS(1)
x := 0
go cpuIntensive(&x)
time.Sleep(100 * time.Millisecond)
// printed only after cpuIntensive is completely finished
fmt.Printf("x = %d.\n", x)
}
Now, if you build and run this code, the output will appear after the
cpuIntensive()
function has terminated.
go run main.go
-
It was added in the commit runtime: implement async scheduler preemption in go version 1.14. ↩︎
Contribute to the discussion in my public inbox by sending an email to ~tomleb/public-inbox@lists.sr.ht [mailing list etiquette]