A Journey into Just-In-Time Compilation in JavaScript Language — Part 5 — An overview of Initial Execution phases in SpiderMonkey

8 min readFeb 5, 2022

Hey guys, In this article we will be going through the abstract overview how the SpiderMonkey — JavaScript Engine for Mozilla Firefox, generates bytecodes from the JavaScript source code and passes it to the Bytecode Interpreter. We will be going through how the JavaScript engine builds the Abstract Syntax Tree from the JavaScript Source text. This article is based on the notes that I have prepared while studying the source code of the SpiderMonkey JavaScript Engine. If there are any discrepancy or misinformation, let me know because I’m still learning.

Parse Tree

The Parser used by the SpiderMonkey can build AST(Abstract Syntax Tree) directly from the source code rather than separating the tokenizing and parsing phases into separate parts. The parsing algorithm is kind of recursive descent parser which uses a function definition for each productions in the Language grammar. For each production, the Parser builds a new parse node and attaches the parse node to the already built parse tree as the parsing goes on. This process goes on until the parses reads the EOF( End-Of-File) token from the token stream. The driving code for both compilation and interpreting initially builds a parse tree to check whether the Source code has no syntax error before going into Bytecode generation. Once this dummy parse tree is constructed without any syntax errors, the driver code will start the real parsing which will result in bytecode generation and execution of the generated bytecode. When the parsing completes, a parse tree will be built and it will be used the BytecodeEmitter to generate Bytecode for the constructed the Parse tree. This emitted bytecode will be executed on an Interpreter, that will interpret each bytecode which are generated by the Bytecode Emitter.

A typical AST(Abstract-Syntax-Tree) generated by the SpiderMonkey’s parse tree constains a root node with multiple adjacent nodes pointed by LinkedList data structure. The Parser implements a base class called ParseNode which are then inherited by different ParseNode classes, each for a different language construct. To see the list of all the ParseNodes, refer the file js/src/frontend/ParseNode.h. The constructed AST can be as follows,

Consider the following JavaScript code snippet

var calc = {
	operand1:"Hello"}

The generated Parse tree will be as follows,

On each blocks in the above figure, the naming convention is NodeName- NodeType. The names on the edges are the field names in the NodeName structs which inherits the NodeType classes. This Parsetree will be used by the Bytecode Emitter for generating bytecode.

Script Stencil

SpiderMonkey initially generates bytecode from the Source text and runs them on an Interpreter environment. This Interpreter environment consumes a data format called Stencil. The Stencil is the frontend representation created by the Mozilla Firefox for representing the Script to be executed by the Interpreter. The Stencil is a collection of data structures that are used by the interpreter for execution. The Stencil organises all the necessary data such as the object literals, BigInt Literals, scopes into vectors which are pointed by indexes rather than Graphs. The reason for this representation is for faster serialization and faster disk I/O accesses.

GC Things

The SpiderMonkey creates an instance of BytecodeEmitter class and proceeds with the bytecode emission. This BytecodeEmitter class will hold the emitted bytecode and the other related data which will be used by the Bytecode Interpreter. Inside this BytecodeEmitter object, there is a field called perScriptData_ with the type PerScriptData. This field is used for storing data that are not associated with any bytecode, but will be used by the bytecode during the execution in the Interpreter. Inside this PerScriptData class, it maintains a list of emitted objects(StringsLiterals, BigInts. etc) called GCthingList. The GC stands for Garbage-Collectable things which means they are allocated on the Garbage Collector heap rather than inside the Maybe container. The Maybe container is a container class inside SpiderMonkey which is used for storing objects that are not allocated inside the Garbage Collector heap. This GCThingList holds all the objects that are encountered by the BytecodeEmitter and they are used by the Interpreter when the code is being executed. After the entire bytecode emission is completed, the driver code will use this BytecodeEmitter object for creating the ScriptStencil data structure which will be executed by the Interpreter. For each and every language construct, such as loop statements, decision statement and functions, the SpiderMonkey uses a state-transition based algorithm for generating bytecodes for those language constructs.

BaseScript

The SpiderMonkey performs code compilation and optimization in multiple levels and phases. On each stage, more optimized version of the previous code will be generated by the compiler. All this profiling information and the generated code is stored on this BaseScript object. Before we go into how the Scripts are stored in BaseScript, let me show you the two different type of BaseScripts that are created by the SpiderMonkey.

LazyScript — A LazyScript is a BaseScript type which only has AST. It doesn’t have bytecode or any other GCthings will be emitted for this Lazy Script. All the LazyScripts will go through a process called delazification before they are to be executed by the Interpreter.
Non-LazyScript — A NonLazy Script is a BaseScript type which has all the GCthings and bytecode generated for Source text. The Non-LazyScript can be executed directly by the Interpreter without any delazification process.

The LazyScripts and Non-LazyScript can be identified by calling the hasBytecode method from the BaseScript object.

Temporal Dead Zone Cache

Temporal Dead Zone cache is an object which keeps track of all the lexically declared variable names inside a specific scope. For each scope, there will be one Temporal Dead Zone cache, which will contain all the lexically declared and initiated variable names. The reason for having an object like this is to have a lookup on whenever the bytecode emitter finds a variable name inside a scope in the AST, it does not need to be checking whether the variable has been initiated or not. Whenever the bytecode emitter enters a new scope with lexical variable names, they all will be marked as CheckTDZ and they will be moved to the TDZCache. When a variable inside a scope is accessed, the JSOp::CheckLexical opcode is emitted and the variable is marked as DontCheckTDZ.

After Parsing

Once the parsing is done and the SpiderMonkey starts to emit Bytecode for the constructed ParseTree. Once the parsing is done, the SpiderMonkey will be checking the ParseTree for any constant expression to perform Constant Folding. Constant Folding is a process of converting a constant expression ( an expression without any variables) into its computed value. Once the Constant Folding is done, the SpiderMonkey will start the emit the Bytecode for the constructed ParseTree. Before emitting the bytecode, the SpiderMonkey need to set up few more additional classes inside the BytecodeEmitter object which are used for storing few other information like the Scopes, Declared variables, etc. The SpiderMonkey uses the function emitScript from BytecodeEmitter class as a driver code for the bytecode generation operation.

Before emitting the bytecode, the SpiderMonkey initiates the Scopes, Temporal Dead Zone Caches and the EmitterScope object. The Scopes are used to note down the scope on which the bytecode emitter is currently on. The TDZCache or Temporal Dead Zone Cache is used to keep track all the declared lexical variables that are inside a specific scope. The EmitterScope object is used to keep track of all the objects that are emitted for a specific scope. The emitScript function will the handling the following scopes,

Eval Scope — The scope created by the eval function in the JavaScript code.
Global Scope — The default scope which contains all the JavaScript in a js file.
Module Scope — The scope created by the JavaScript code that are imported into the JavaScript code using import keyword.

For now, we will be focusing on how the emitScript function setup the Global scope. Initially, the emitScript function will create a EmitterScope object and updates the emitterScope_ field in the BytecodeEmitter object. After this process, the emitScript function will be updating the variable names for each scope in the NameCache type field present in the EmitterScope object. Then the EmitterScope retrieves the collected names using the ensureCache function which from EmitterScope object.

During this process, the scope creation makes use of the bindings information for that particular scopes. The bindings are the mapping from the variable names in the NameCache to their location in the Scope ,i.e, on which outer scope or the current scope, they are being defined. This scope information or lexical environment contains the objects that are defined for that scope in the JavaScript code. This information is present in the AbstractData structure inside each of the following scope classes

LexicalScope
FunctionScope
VarScope
GlobalScope
EvalScope
ModuleScope
WasmInstanceScope
WasmFunctionScope

All these scope classes have an AbstractData structure that is defined differently for each scope class depending on their usage. The bindings in the AbstractData structure is checked by the createForGlobalScope from the ScriptStencil class. The createForGlobalScope function uses MarkParserScopeData function to mark all the bindings inside the global scope as used by the ScriptStencil.

After this operation, the enterGlobal function from the BytecodeEmitter verifies whether there are any bindings left out from marking. If it finds any left out bindings, it will be traverse through them and maps the binding names to their location in global scope in the NameLocation cache.

After this global scope setup process, the emitScript function will be initializing all the declarations in the ParseTree using the emitDeclarationInitiation function. This function is responsible for initialization of all the bindings in the global context. Then the function definitions in the global scopes are being hoisted, which means the bytecode for those functions are emitted during this stage. This functionality is carried out by the defineHoistedTopLevelFunctions function. Then emitDeclarationInstantiation function will emit the bytecode for GlobalOrEvalDeclInstantiation which will instruct the interpreter to initialize the bindings in the global scope.

After this process, the bytecode generation for the Parse Tree will be initiated using the emitTree function from BytecodeEmitter object. At each bytecode emission, the emitCheck function will be checking the bytecde wheterh is has JOF_IC flag. if it does, it will be incrementing the numICentires field in the BytecodeSection class. IC’s are called Inline Caches, one of the ways for implementing the polymorphic code generated by the Compiler. Once the Bytecode emission is completed, the ScriptStencil is created from this generated bytecode by the IntoScriptStencil function. This process carries on by first creating the ImmutableScriptData from the bytecode generated. The createImmutableScriptData function from the BytecodeEmitter class does this. This function will be initialising the ImmutableScriptData class with the bytecode generated. Then the objects in the GCThingsList is copied into the ScriptStencils gcthings field. Then the SharedImmutableScriptData object is created with the ImmutableScriptData object and is placed in the sharedData field in the ScriptStencil.

A point to be noted here is that, the scope information for each level scope is recorded in the CompilationStencil class where scriptData keeps track of these things. Each element in the scriptData is treated as a new scope. This information is recorded during the Parsing

After all this process, the entire ScriptStencil will be moved into the JSScript object. The JSScript object holds the ultimate data that will be used by the Interpreter. The emitted bytecode is located at the sharedData_ field from the JSScript object, which is an SharedImmutableScriptData types object. The SharedImmutableScriptData consists of the pointer the ImmutableScriptData which calls a static function called offsetofCode which returns the pointer to the emitted bytecode from the JitCode class.

Once the JSScript object is initialized and is ready for execution, it will be passed to the Interpreter for execution, which will begin to execute the generated JSScript.