A Journey into Just-In-Time Compilation of Javascript Langauge — Part 2

John Suthesh
4 min readJun 28, 2021

--

Rome was not built in a day

Hey Guys,

Welcome to the second part of our series. This part of the series explains briefly on how the compiler compiles a source code into executable machine code. As the quote given at the beginning of the article, programming language compilation is not a simple process or one hit process. It happens in multiple phases. I will be explaining each phases briefly in this article about what input each phases receives and its output. Let’s get started

Lexical Analysis

Lexical Analysis or Scanning is the first phase of our compilation process. In this phase, the source program is analyzed to generate lexemes or tokens from the source program. A lexeme is a meaningful sequence of characters from the source code. Mostly keywords like “int”, “float”, variable names and other elements including strings from the source language are converted into tokens. A lexeme has an abstract representation as follows

<token-name, attribute-value>

The lexical analyzer builds a symbol table to keep track of these lexemes. For example, consider a statement as follows,

input = data * 2

For the above statement, the lexical analyzer builds the tokens as follows,

<id,1> < = > <id,2> <*> <2>

The id in the above tokens is nothing but an abstract representation for the identifier or the variable name. The attribute-value field points to the position of this identifier in the symbol table. For characters like operators, they don’t posses any entry in the symbol as they can be directly represented in the tokens. The main purpose of this symbol table is to hold the meta data about the identifiers or variables through out the compilation process. This symbol table is used by multiple phases during the compilation process. This sequence of tokens or lexemes are passed to the next phase which is Syntax Analysis

Syntax Analysis

The token stream from the lexical analysis is passed to the Syntax Analysis phase. The whole purpose of syntax analysis phase is to generate syntax tree from this token stream. For this purpose, the Syntax Analyzer uses something called Language Grammar. A grammar is nothing but a rules about how the source program should be structured for the compiler to understand it. For example, in English, the grammar for a complete sentence is as follows

subject verb object

This is the grammar for a complete sentence in English. If I say something like “walks John a dog”, this wouldn’t make any sense since it does not match the English grammar. So we won’t be able to understand what the sentence means. Likewise, compiler also uses a grammar to understand the source program. It can only understand the source program only if the source program matches with the language grammar. A typical language grammar will be something like the follows,

S -> E + T

E -> id1

T -> id2

Where S, E, T are called non-terminals and id1, id2 are called terminals in a grammar. Using this grammar, the Syntax Analyzer checks the tokens for syntax consistency. If the token doesn’t match with the grammar, the compiler raises a Syntax Error. During this Syntax analysis, the compiler builds a syntax tree and it will be passed to next phase which is called Semantic Analysis.

Semantic Analysis

Semantic Analysis is the phase where the compiler analyses the semantics or meaning of the source program. For example, consider the following statement from C programming language,

int a = “s” + 100;

The above statement is syntactically correct and we will be able to construct a syntax tree from this statement. But this syntax tree doesn’t make any sense, since we cannot add a string and a number. It is the job of the semantic analyzer to check these validity. A best example would be, take English language, consider the following sentence.

I killed a stone.

If you check this against the English grammar, this would match with it. But does it make any sense? Obviously you can’t kill a stone since stone is not a living thing. Just like this, we use semantic analyzer to check the validity and semantic meaning of the Syntax tree. During this phase, there’s a important process that take place which is called Type Checking. Type checking is nothing but a process of checking and determining the type of each identifier in the syntax tree and checking whether the types of the operands in a statement unify or does it need to do any type conversion to make them unified. A best example is the code statement above. After performing semantic analysis on the syntax tree, compiler passes the syntax tree to the Intermediate Code Generation phase.

Intermediate Code Generation

Intermediate code generation is a process where the Syntax tree is converted in to some intermediate code representation, which then can be used to generate machine dependent binary code. The phases until the Intermediate code generation is called frontend. The purpose of frontend is to provide a Intermediate code representation which can then be passed to the backend for generating machine dependent code. With n number of frontends and m number of backends, we can generate m * n number of compilers by porting each frontend with different backends and vice versa. The Intermediate Representations are of multiple types such as three address code, Abstract syntax tree, Bytecodes etc.

In the next article, we will be briefly going through the backend phases of the compilation process in a compiler and how it generates code from the Intermediate Representation.

--

--

John Suthesh
John Suthesh

No responses yet