Debugging distributed programs is difficult, but we’ve been hard at work developing tools to make it easier within the Goblins object capability system. While this work is still in its early stages, we’re excited to share the progress we’ve made so far!
The new Goblins debugger allows programmers to inspect and debug distributed computations that happen across many vats (communicating event loops.) The time travel feature allows for visiting past events and inspecting program state at the time the event happened.
These tools are implemented as “meta-commands” for Guile's REPL, complementing the tools that Guile already provides for sequential debugging. In true Scheme fashion, this means that debugging happens live, while the program is running, allowing programmers to fix bugs and try again without having to stop and recompile/reboot their program.
Seeing is believing, so let’s walk through building a simple program and using the new debugger to find and fix a small bug. We’re going to make the world’s simplest dungeon crawling RPG. It will allow a player to attack a monster to deal damage, and that’s about it.
The first step is to start up a Guile REPL and import the Goblins modules that we need:
scheme@(guile-user)> ,use (goblins)
scheme@(guile-user)> ,use (goblins actor-lib cell)
scheme@(guile-user)> ,use (goblins actor-lib let-on)
scheme@(guile-user)> ,use (goblins actor-lib methods)
The (goblins)
module provides the core Goblins API. The other three
modules are part of Goblins’ handy standard library of actor
abstractions. In Guile's REPL, expressions that start with a comma
invoke a meta-command. Guile provides a number of these commands,
like ,use
. We’ll be seeing commands specific to Goblins later.
Next, we need to create some vats where our actors can live. Since
there is overhead in collecting debug data, vat logging is currently
opt-in on a per-vat basis. The #:log? #t
keyword argument enables
logging.
scheme@(guile-user)> (define p-vat (spawn-vat #:name 'Player #:log? #t))
scheme@(guile-user)> (define m-vat (spawn-vat #:name 'Monster #:log? #t))
scheme@(guile-user)> (define c-vat (spawn-vat #:name 'Cave #:log? #t))
Now we can define some actors to live in our vats. To do that, we first need to define constructor procedures. We only need two kinds of actors for this demo: Characters and dungeons. Characters have a name, some amount of hit points (HP), and a strength value. They can be damaged by attacks and die when their HP reaches 0.
scheme@(guile-user)>
(define (^character _bcom name max-hp strength)
;; HP is reduced when the character is attacked, so we use a cell
;; actor to store a counter that can be modified.
(let ((hp (spawn ^cell max-hp)))
(methods
;; Getters
((name) name)
((strength) strength)
((hp) ($ hp))
;; Deal combat damage
((damage amount)
($ hp (- ($ hp) amount))
#t) ; attacks always hit
;; Check if character is dead
((dead?)
(zero? ($ hp))))))
Dungeons are places where a player character fights a monster character. For simplicity, this is going to be a rather one-sided battle and we’ll only implement the player’s turn. A player turn starts by dealing damage to the monster and ends by returning a log summarizing what happened. If the monster is killed, the log will say so.
scheme@(guile-user)>
(define (^dungeon _bcom player monster)
(methods
((player-turn)
(let-on ((player-name (<- player 'name))
(monster-name (<- monster 'name))
(strength (<- player 'strength)))
;; Deal damage to the monster and check to see if it's dead.
(let*-on ((hit? (<- monster 'damage strength))
(dead? (<- monster 'dead?)))
(let ((damage-msg (format #f "~a deals ~a damage to ~a!"
player-name strength monster-name)))
;; Log a message if the monster is dead.
(if dead?
(let ((dead-msg (format #f "~a has been defeated!"
monster-name)))
(list damage-msg dead-msg))
(list damage-msg))))))))
To show off the “distributed” part of “distributed debugger”, we’ll spawn actors across all three of the vats we defined earlier. The player character “Alice” will live in the Player vat, p-vat. The monster, a basilisk, will live in the Monster vat, m-vat. The dungeon will live in the Cave vat, c-vat. This simulates how, in a real game built using a distributed architecture, different actors in the game world would be hosted on many different computers.
scheme@(guile-user)> (define alice
(with-vat p-vat (spawn ^character "Alice" 100 30)))
scheme@(guile-user)> (define basilisk
(with-vat m-vat (spawn ^character "Basilisk" 20 15)))
scheme@(guile-user)> (define dungeon
(with-vat c-vat (spawn ^dungeon alice basilisk)))
With our actors spawned, we are ready to play this trivial game. But first, we’ll use a Goblins-specific REPL command to enter a new context for evaluating code:
scheme@(guile-user)> ,enter-vat c-vat
Entering vat 'Cave'. Type ',q' to exit. Type ',help goblins' for help.
goblins/Cave@(guile-user) [1]>
Note the change in the REPL prompt to indicate that we are now in the
Goblins vat named “Cave” and we are in a nested REPL session that is one
level deep. The ,enter-vat
command is provided by Goblins to allow
code (and other Goblins REPL commands) to be evaluated within the
context of a specific vat. This makes Goblins operators (such as
spawn
, $
, and <-
) “just work” since they require a vat in order
to do their job.
We're now ready to have Alice take a turn. Alice will attack the
basilisk, dealing 30 damage. The basilisk only has 20 HP, so it
should die and we should see a log message saying so. Getting a
result requires communicating with the Player and Monster vats, so we’ll
use the the ,vat-resolve
command to wait for the asynchronous operation
to complete and print out the results:
goblins/Cave@(guile-user) [1]> ,vat-resolve ($ dungeon 'player-turn)
("Alice deals 30 damage to Basilisk!")
Hmm, that’s not right! The log says that Alice dealt 30 damage, but
it should also say that the basilisk was defeated. Let’s debug and
see if we can spot the error. For starters, we can run ,vat-tail
to
see the most recent messages that the Cave vat has processed:
goblins/Cave@(guile-user) [1]> ,vat-tail
Churn 7:
36: (receive listen #<local-promise>)
Churn 8:
37: (receive message #<local-object ^resolver> fulfill #f)
38: (receive message #<local-object ^on-listener> fulfill #f)
39: (receive message #<local-object fulfilled-handler> #f)
40: (receive message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
41: (receive message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
42: (receive message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
43: (receive message #<local-object ^on-listener> fulfill ("Alice deals 30 damage to Basilisk!"))
44: (receive message #<local-object fulfilled-handler> ("Alice deals 30 damage to Basilisk!"))
45: (receive message #<local-object finally-handler>)
The events have been printed in chronological order. There is some
useful information in here, but only events in the Cave vat are shown and
there’s nothing that describes how these events relate to one another.
However, we now know that event 45 is the most recent. This event
represents the terminal point of the ,vat-resolve
command we
evaluated above.
To view the events related to another event, across all relevant vats,
we will use the ,vat-tree
and ,vat-graph
commands. ,vat-tree
outputs a textual representation:
goblins/Cave@(guile-user) [1]> ,vat-tree 45
Vat Cave, 2: (message #<local-object ^call-with-vat>)
├▸ Vat Player, 4: (message #<local-object ^character> name)
│ └▸ Vat Cave, 10: (message #<local-object ^resolver> fulfill "Alice")
├▸ Vat Monster, 5: (message #<local-object ^character> name)
│ └▸ Vat Cave, 16: (message #<local-object ^resolver> fulfill "Basilisk")
│ └▸ Vat Cave, 19: (message #<local-object ^resolver> fulfill ("Alice" "Basilisk" 30))
│ └▸ Vat Monster, 23: (message #<local-object ^character> damage 30)
│ └▸ Vat Cave, 28: (message #<local-object ^resolver> fulfill #t)
│ └▸ Vat Monster, 32: (message #<local-object ^character> dead?)
│ └▸ Vat Cave, 37: (message #<local-object ^resolver> fulfill #f)
│ └▸ Vat Cave, 40: (message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
│ └▸ Vat Cave, 41: (message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
└▸ Vat Player, 6: (message #<local-object ^character> strength)
└▸ Vat Cave, 13: (message #<local-object ^resolver> fulfill 30)
And ,vat-graph
outputs a graphical Lamport causality diagram (thanks
to Graphviz):
goblins/Cave@(guile-user) [1]> ,vat-graph 45
In each of these views, we can see that the message (<- monster 'dead?)
(with the debug representation of msg ^character dead?
in
the ,vat-graph
output) resolves to #f
when we expected #t
. Did
you miss it? Here’s the relevant bit of the ,vat-tree
output above:
│ └▸ Vat Monster, 32: (message #<local-object ^character> dead?)
│ └▸ Vat Cave, 37: (message #<local-object ^resolver> fulfill #f)
But why did this happen? The monster should have been dead. Let’s
travel back in time and find out. The first step is to truly enter
the debugger so we can walk backwards in time to when the dead?
check happened. To do this, we’ll use the ,vat-debug
command, which
takes an event ID argument and starts a new sub-REPL with all the
context we need to debug the problem.
goblins/Cave@(guile-user) [1]> ,vat-debug 45
Entering a new prompt. Type `,q' to exit.
goblins/Cave@(guile-user) [2]>
Note that the [1]
in the prompt became a [2]
because we are in yet
another sub-REPL. To get our bearings, we’ll use the ,vat-trace
command to get a distributed backtrace showing the direct causes of
event 45:
goblins/Cave@(guile-user) [2]> ,vat-trace
In vat Cave:
Churn 2:
2: (message #<local-object ^call-with-vat>)
4: (message #<local-object ^character> name)
In vat Monster:
Churn 2:
6: (message #<local-object ^resolver> fulfill "Basilisk")
In vat Cave:
Churn 5:
17: (message #<local-object ^on-listener> fulfill "Basilisk")
18: (message #<local-object fulfilled-handler> "Basilisk")
19: (message #<local-object ^resolver> fulfill ("Alice" "Basilisk" 30))
20: (message #<local-object ^on-listener> fulfill ("Alice" "Basilisk" 30))
21: (message #<local-object fulfilled-handler> ("Alice" "Basilisk" 30))
22: (message #<local-object ^character> damage 30)
In vat Monster:
Churn 3:
24: (message #<local-object ^resolver> fulfill #t)
In vat Cave:
Churn 7:
29: (message #<local-object ^on-listener> fulfill #t)
30: (message #<local-object fulfilled-handler> #t)
31: (message #<local-object ^character> dead?)
In vat Monster:
Churn 4:
33: (message #<local-object ^resolver> fulfill #f)
In vat Cave:
Churn 8:
38: (message #<local-object ^on-listener> fulfill #f)
39: (message #<local-object fulfilled-handler> #f)
40: (message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
41: (message #<local-object ^resolver> fulfill ("Alice deals 30 damage to Basilisk!"))
43: (message #<local-object ^on-listener> fulfill ("Alice deals 30 damage to Basilisk!"))
45: (message #<local-object finally-handler>)
Much like a regular backtrace in Guile’s sequential debugger, output
is in order from outermost call to innermost. In other words, the
most recent event is the last event printed. Our debugging session
starts focused on event 45, but we can shift our focus with the
,vat-up
, ,vat-down
, and ,vat-jump
commands.
,vat-up
moves up one event, from 45 to 43 in this case:
goblins/Cave@(guile-user) [2]> ,vat-up
Now in vat Cave, event 43:
(receive message #<local-object ^on-listener> fulfill ("Alice deals 30 damage to Basilisk!"))
,vat-down
moves down one event, back to event 45 in this case:
goblins/Cave@(guile-user) [2]> ,vat-down
Now in vat Cave, event 45:
(receive message #<local-object finally-handler>)
,vat-jump
moves to a particular event ID, so we can save ourselves
the tedium of typing ,vat-up
or ,vat-down
a bunch of times if the
event we're trying to inspect is far away from our current position.
In fact, event 31 looks particularly interesting, since that’s when
the dead?
check happened. Let's jump there now:
goblins/Cave@(guile-user) [2]> ,vat-jump 31
Now in vat Cave, event 31:
(send message #<local-object ^character> dead?)
What ,vat-trace
is hiding, for brevity’s sake, is that this “send” event in the Cave vat has a corresponding “receive” event in the Monster vat. Let’s step down the trace and see where we end up:
goblins/Cave@(guile-user) [2]> ,vat-down
Now in vat Monster, event 32:
(receive message #<local-object ^character> dead?)
We’re now focused on an event in vat Monster, where the basilisk lives.
This means that we can travel back in time and send messages to the
basilisk as it existed at event 32. To do this, we'll use the
,vat-peek
command. Let's see what its HP looked like at this point:
goblins/Cave@(guile-user) [2]> ,vat-peek basilisk 'hp
-10
Hmm, zero should be the lowest HP value. The dead?
method is
specifically checking if the character’s HP is zero. It’s a safe bet
that the bug is in the damage
method, which happened around event
22, so let’s confirm our suspicion. We’ll jump there, move down into
an event in the Monster vat, and check out the basilisk's HP:
goblins/Cave@(guile-user) [2]> ,vat-jump 22
Now in vat Cave, event 22:
(send message #<local-object ^character> damage 30)
goblins/Cave@(guile-user) [2]> ,vat-down
Now in vat Monster, event 23:
(receive message #<local-object ^character> damage 30)
goblins/Cave@(guile-user) [2]> ,vat-peek basilisk 'hp
20
Thanks to time travel, we can see that the starting HP value is 20, like it should be. The basilisk’s health was indeed initialized correctly. Moving down one more event, we can see for sure that this is the point where we entered a problematic state:
goblins/Cave@(guile-user) [2]> ,vat-down
Now in vat Monster, event 24:
(send message #<local-object ^resolver> fulfill #t)
goblins/Cave@(guile-user) [2]> ,vat-peek basilisk 'hp
-10
A note for the curious: Goblins uses transactional heaps to manage actor state, enabling us to take snapshots of a vat before each event is processed. Querying these snapshots allows users to travel through time while debugging. We’re using very simple code for demonstration purposes, but we believe that this feature will make debugging real programs, with complex logic and state, much more feasible.
So, what we’re missing is some code to enforce a lower bound on the HP value. Let’s update our character constructor to fix that issue (exiting the nested REPLs and getting back to the top-level Guile prompt):
goblins/Cave@(guile-user) [2]> ,quit
goblins/Cave@(guile-user) [1]> ,quit
scheme@(guile-user)>
(define (^character _bcom name max-hp strength)
;; HP is reduced when the character is attacked, so we use a cell
;; actor to store a counter that can be modified.
(let ((hp (spawn ^cell max-hp)))
(methods
;; Getters
((name) name)
((strength) strength)
((hp) ($ hp))
;; Deal combat damage
((damage amount)
;; HP can't go below 0.
($ hp (max (- ($ hp) amount) 0))
#t) ; attacks always hit
;; Check if character is dead
((dead?)
(zero? ($ hp))))))
A call to max
has been added to the damage
method to ensure that
zero is the lowest HP value. All that’s left to do now is rebuild the
actors and verify the fix:
scheme@(guile-user)> (define alice
(with-vat p-vat (spawn ^character "Alice" 100 30)))
scheme@(guile-user)> (define basilisk
(with-vat m-vat (spawn ^character "Basilisk" 20 15)))
scheme@(guile-user)> (define dungeon
(with-vat c-vat (spawn ^dungeon alice basilisk)))
scheme@(guile-user)> ,enter-vat c-vat
Entering vat 'Cave'. Type ',q' to exit. Type ',help goblins' for help.
goblins/Cave@(guile-user) [1]> ,vat-resolve ($ dungeon 'player-turn)
("Alice deals 30 damage to Basilisk!" "Basilisk has been defeated!")
We see the Basilisk has been defeated
message now. Problem solved!
We hope this small demo has given you a taste of the exciting things
to come in the next Guile Goblins release. If you want to use these
tools right now, all the code used for this demo is available in the
main
branch of the Goblins Git
repository. While it is
already quite useful, right now the debugger only supports multiple
vats in a single Guile process. In the near future, we plan to add
network debugging via CapTP. We’d also like to take this opportunity
to give credit to E’s Causeway
debugger as a
significant source of inspiration and prior art that was crucial for
this work.
If you'd like to try this yourself, here's the cut-and-paste text you can use...
,use (goblins)
,use (goblins actor-lib cell)
,use (goblins actor-lib let-on)
,use (goblins actor-lib methods)
(define p-vat (spawn-vat #:name 'Player #:log? #t))
(define m-vat (spawn-vat #:name 'Monster #:log? #t))
(define c-vat (spawn-vat #:name 'Cave #:log? #t))
(define (^character _bcom name max-hp strength)
;; HP is reduced when the character is attacked, so we use a cell
;; actor to store a counter that can be modified.
(let ((hp (spawn ^cell max-hp)))
(methods
;; Getters
((name) name)
((strength) strength)
((hp) ($ hp))
;; Deal combat damage
((damage amount)
($ hp (- ($ hp) amount))
#t) ; attacks always hit
;; Check if character is dead
((dead?)
(zero? ($ hp))))))
(define (^dungeon _bcom player monster)
(methods
((player-turn)
(let-on ((player-name (<- player 'name))
(monster-name (<- monster 'name))
(strength (<- player 'strength)))
;; Deal damage to the monster and check to see if it's dead.
(let*-on ((hit? (<- monster 'damage strength))
(dead? (<- monster 'dead?)))
(let ((damage-msg (format #f "~a deals ~a damage to ~a!"
player-name strength monster-name)))
;; Log a message if the monster is dead.
(if dead?
(let ((dead-msg (format #f "~a has been defeated!"
monster-name)))
(list damage-msg dead-msg))
(list damage-msg))))))))
(define alice (with-vat p-vat (spawn ^character "Alice" 100 30)))
(define basilisk (with-vat m-vat (spawn ^character "Basilisk" 20 15)))
(define dungeon (with-vat c-vat (spawn ^dungeon alice basilisk)))
,enter-vat c-vat
,vat-resolve ($ dungeon 'player-turn)
,vat-tail
,vat-tree 45
,vat-graph 45
,vat-debug 45
,vat-trace
,vat-up
,vat-down
,vat-jump 31
,vat-down
,vat-peek basilisk 'hp
,vat-jump 22
,vat-down
,vat-peek basilisk 'hp
,vat-down
,vat-peek basilisk 'hp
,quit
,quit
(define (^character _bcom name max-hp strength)
;; HP is reduced when the character is attacked, so we use a cell
;; actor to store a counter that can be modified.
(let ((hp (spawn ^cell max-hp)))
(methods
;; Getters
((name) name)
((strength) strength)
((hp) ($ hp))
;; Deal combat damage
((damage amount)
;; HP can't go below 0.
($ hp (max (- ($ hp) amount) 0))
#t) ; attacks always hit
;; Check if character is dead
((dead?)
(zero? ($ hp))))))
(define alice (with-vat p-vat (spawn ^character "Alice" 100 30)))
(define basilisk (with-vat m-vat (spawn ^character "Basilisk" 20 15)))
(define dungeon (with-vat c-vat (spawn ^dungeon alice basilisk)))
,enter-vat c-vat
,vat-resolve ($ dungeon 'player-turn)