A Journey into Just-In-Time Compilation in Javascript Language — Part 3
Hey Guys,
In my previous blog I explained you about the frontend phases that happens during the process of programming language compilation. In this article, I will be explaining about the Intermediate Representation in a compiler. I will be answering the questions like what is an Intermediate Representation and why the compiler creates an Intermediate Representation. So let’s get started
Intermediate Representation
An Intermediate Representation is used for representing the source code in a structure that can be later used by the backend of the compiler to generate the native executable code. You might be wondering why we need this Intermediate representation, instead of generating the native binary code directly from the Abstract Syntax Tree.
The Intermediate Representation which we are discussing here is different from the Abstract Syntax Tree, which is also an Intermediate Representation. Most compilers use this Abstract Syntax Tree to generate a Intermediate Representation that has more metadata and information that the compiler backend needs.
Why you need an Intermediate Representation?
Consider a scenario, where you need to build a compiler of same programming language for multiple CPU architectures. In those cases, you can’t use the compiler built for a specific CPU architecture Binary Language to generate binary executable for a different CPU architecture. The instruction set for one architecture of CPU cannot be understood by another CPU architecture. In these case, you need a specific representation which represents the source language, that can be ported with different backends that generates binary native code for that different CPU architectures. The term backend is nothing but the code generator of the compiler. By using multiple backends which can understand a specific Intermediate Representation, we can port a frontend with any backend that can understand this Intermediate Representation. An example of Intermediate Representation is as follows, which is the Bytecode data for the Python programming Language
The instructions given in the image are the disassembled bytecode which will be used by the python backend for native code generation.
The Intermediate Representation allows us to build compilers for different CPU architectures without re-doing the process what the frontend of the compiler does, for different architecures. Another advantage of using lntermediate Representation is, we can make multiple frontends to generate the Intermediate Representation for multiple programming languages and port them to a specific CPU architecture backend that can understand this Intermediate Representation. By this way, we can reuse a single backend for multiple programming languages.
Example
For example, Consider you have a backend that generates native binary code for the Intel x86_64 CPU architecture which can understand a specific Intermediate Representation called NativeCode. If you want to create a compiler for C programming language for x86_64 CPU architecture, all you need to do is build a frontend that parses and creates an Intermediate Representation that can be passed to the NativeCode(backend).
As I mentioned in the previous blog, if we have m number of frontends for m number of programming languages and n number of backends for n number of CPU architectures, we can create m * n number of compilers by porting the each frontend with different backend and vice versa.
In the upcoming article, we will be discussing about the Code Generation and Code optimization techniques that the compiler uses for generating efficient binary code.