In the last episode, I have showed a rather important limitation of the tiny-interp
interpreter:
def cond():
x = 3
if x < 5:
return "yes"
else:
return "no"
Control flow and function calls were not handled, as a result tiny-interp
could
not interpret the above code fragment.
In the following, I’ll ditch tiny-interp
and switch to the “real” pygo
interpreter.
Real Python bytecode
People having read the AOSA article know that the structure of the bytecode of
the tiny-interp
interpreter instruction set is in fact very similar to the
one of the real python bytecode.
Indeed, if one defines the above cond()
function in a python3
prompt and
enters:
### bytecode as raw bytes
>>> print(cond.__code__.co_code)
b'd\x01\x00}\x00\x00|\x00\x00d\x02\x00k\x00\x00r\x16\x00d\x03
\x00Sd\x04\x00Sd\x00\x00S'
### bytecode as numbers
>>> print(list(cond.__code__.co_code))
[100, 1, 0, 125, 0, 0, 124, 0, 0, 100, 2, 0, 107,
0, 0, 114, 22, 0, 100, 3, 0, 83, 100, 4, 0, 83,
100, 0, 0, 83]
This doesn’t look very human friendly.
Luckily, there is the dis
module that can ingest low-level bytecode
and prints it in a more human-readable way:
>>> import dis
>>> dis.dis(cond)
2 0 LOAD_CONST 1 (3)
3 STORE_FAST 0 (x)
3 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (5)
12 COMPARE_OP 0 (<)
15 POP_JUMP_IF_FALSE 22
4 18 LOAD_CONST 3 ('yes')
21 RETURN_VALUE
6 >> 22 LOAD_CONST 4 ('no')
25 RETURN_VALUE
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Have a look at the official dis
module documentation for more informations.
In a nutshell, the LOAD_CONST
is the same than our toy OpLoadValue
and LOAD_FAST
is the same than our toy OpLoadName
.
Simply inspecting this little bytecode snippet shows how conditions and branch-y
code might be handled.
The instruction POP_JUMP_IF_FALSE
implements the if x < 5
statement from the
cond()
function.
If the condition is false
(i.e.: x
is greater or equal than 5
), the interpreter
is instructed to jump to position 22
in the bytecode stream, i.e. the return "no"
body of the false
branch.
Loops are handled pretty much the same way:
>>> def loop():
... x = 1
... while x < 5:
... x = x + 1
... return x
...
>>> dis.dis(loop)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (x)
3 6 SETUP_LOOP 26 (to 35)
>> 9 LOAD_FAST 0 (x)
12 LOAD_CONST 2 (5)
15 COMPARE_OP 0 (<)
18 POP_JUMP_IF_FALSE 34
4 21 LOAD_FAST 0 (x)
24 LOAD_CONST 1 (1)
27 BINARY_ADD
28 STORE_FAST 0 (x)
31 JUMP_ABSOLUTE 9
>> 34 POP_BLOCK
5 >> 35 LOAD_FAST 0 (x)
38 RETURN_VALUE
The above bytecode dump should be rather self-explanatory.
Except perhaps for the RETURN_VALUE
instruction: where does the
instruction return to?
To answer this, a new concept must be introduced: the Frame
.
Frames
As the AOSA article puts it:
A frame is a collection of information[s] and context for a chunk of code.
Whenever a function is called, a new Frame
is created, carrying a data stack
(the local variables we have played with so far) and a block stack (to handle
control flow such as loops and exceptions.)
The RETURN_VALUE
instructs the interpreter to pass a value between Frames
,
from the callee’s data stack back to the caller’s data stack.
I’ll show the pygo
implementation of a Frame
in a moment.
Pygo components
Still following the blueprints of AOSA and byterun
, pygo
is built on
the following types:
-
a
VM
(virtual machine) which manages the high-level structures (call stack of frames, mapping of instructions to operations, etc…). TheVM
is a slightly more complex version of the previousInterpreter
type fromtiny-interp
, -
a
Frame
: everyFrame
value contains a code value and manages some state (such as the global and local namespaces, a pointer to the callingFrame
and the last bytecode instruction executed), -
a
Function
to model real Python functions: this is to correctly handle the creation and destruction ofFrames
, -
a
Block
to handle Python block management on to which control flow and loops are mapped.
Virtual machine
Each value of a pygo.VM
must store the call stack, the Python
exception state and the return values as they flow between frames:
type VM struct {
frames Frames // call stack of Frames
fp *Frame // pointer to current Frame
ret Value // return value
exc Exception // last exception
}
A pygo.VM
value can run bytecode with the RunCode
method:
func (vm *VM) RunCode(code Code, globals, locals map[string]Value) (Value, error) {
frame := vm.makeFrame(code, globals, locals, vm.fp)
return vm.runFrame(frame)
}