A Journey into Just-In-Time Compilation in Javascript Language — Part 4
Hi Guys, In this article, we will be covering about the code generation and code optimization phases in the Compilation process.
Code Optimization
The Intermediate Representation or IR which is constructed in the previous phase will represent the entire code structure. If we generate the native code or Bytecode directly from this IR, the resulting code will be very in-efficient in speed and size of code. To increase the speed of the generated code and reduce the size of the generated code, compiler does optimizations on the IR itself and after the native code has been generated.
There are two types of code optimizations that occurs in the compiler, one is machine-independent code optimization and the another one is machine-dependent code optimization.
Machine Independent Optimization
Let’s get started with the machine independent code optimization. In this type, the compiler does the optimization in the IR.
For example, there are cases where a single arithmetic operation on a variable is done on multiple parts of code within a given scope. Such as using index variable to calculate the offset in an array, like the following code snippet.
int a[100];
for(int i=0; i< 100; i++ ) {
a[i] = a[i] + 10;}
This operation requires the compiler to generate an IR representation as follows using three-address code.
t = a
i = 0
loop:
if i >= 100 goto end
t1 = i * 4
t2 = t[t1]
t3 = t2 + 10
t4 = i * 4
t[t4] = t3
goto loop
end:
In the above three-address code, we can observe that the arithmetic operation i * 4, is being repeated two times inside the same scope. This expression is called Local sub-expression. This kind of redundant operations can be eliminated from the IR using the machine independent optimization techniques. There are other optimization methods available such as dead code elimination, local sub-expression elimination which we saw just now, Peephole optimization etc which can be used for increasing the efficiency of the generated code.
The machine independent code optimization techniques involves Boolean Algebra and theorems in discrete mathematics. One major field of mathematics that is used heavily in the machine independent optimization is the Order theory which is for performing Data-flow analysis. The Data-flow analysis can be used for performing live-variable analysis which is used for finding the long living variable in the code structure so that it can be saved in a register instead of storing it in the memory so that the latency in the fetching of value from the memory for this variable be eliminated since it is stored in the register. We will discuss these concepts in deeper in the later articles.
Machine Dependent Optimization
In machine dependent optimization, the optimization can be done using the CPU architecture dependencies. So what is the difference between machine independent optimization and machine dependent optimization. In machine independent optimization, the compiler performs optimizations using the mathematics, rules of inference and boolean algebra. In machine dependent optimization, the optimization depends heavily on the ISA of the target CPU architecture. For examples, consider the following assembly code from the intel x86_64 architecture, which does the same operation from the code in the previous section.
As you can see from the above screenshot, the compiler has chosen multiplication operation for performing the i * 4 snippet from the previous code. This operation can be optimized by replacing the multiplication operation with the shift left operation, which depending on the number being multiplied. In this case, its a multiplication by 4, which is nothing but multiplication by 2². Every number which is multiplied by 2^n can be replaced by the operation number <<n which is shifting the number n bits to the left. This is an example of machine dependent optimization that can be used for increasing the efficiency on the code.
Code Generation
After performing, code optimization, a compiler will generate code from the optimized IR. Sometimes the compiler even performs optimizations after generating the code to increase the efficiency of the generated code.
The compiler generates code from the IR using the by converting the IR into basic blocks. The basic block representation is nothing but splitting the code into chunks called where the code can be reached from the other blocks using labels.
The basic blocks are represented using Flow-graphs with directed vertices representing their control flow. The statements in each basic block is converted into a DAG or Directed Acyclic Graphs.
The purpose of representing the statements in the basic blocks as DAGs is we can use tree tiling method for generating code from this statement. We will discuss how these techniques are used for generating code from the basic blocks. In this phase, the compiler will also perform Register Allocation ,Live Variable Analysis and other operations during this phase. We will be going through these topics in detail in later articles.
So as of now, we have covered a brief introduction into all the phases of the compilation. In the upcoming articles, we will be going through the working of Javascript Engine SpiderMonkey. We will see how the SpiderMonkey does its job of compilation and executing it in the virutal machine.