All Articles

Differences between compilation types

What is a compiler?

A compiler (e.g. gcc, clang etc.) is defined as something that transforms code from one representation into another. It usually accomplishes this through various subtasks. A simplified stage analysis might look something like:

Frontend (lexing, parsing, performing semantic analysis...)
|
--> Generates an intermediate representation (IR)
|
Optimizations (remove dead code, common subexpression elimination...)
|
--> Also an IR
|
Code generation (scheduler, register allocator, machine assembly generator)

A quick example to see what is going on you can perform yourself before we continue is, take the following c code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (void)
{
	char h[] = "the quick brown fox jumps over the lazy dog";
	char *f = strstr(h, "fox");
	if (f) {
		printf("%s\n", f);
	}
	return 0;
}

You’ll need clang and llvm. Then compile this program as follows:

  1. clang -emit-llvm -S str.c -o str.ll - Examine the generated LLVM IR
  2. opt -O3 -S str.ll -o str.opt.ll - Examine the optimized code, note that it contains less lines than the str.ll file
  3. llc -O3 str.opt.ll -o str.s - Examine the machine assembly
  4. clang str.s - Finally create a binary

A contrived example - the same thing can be accomplished by simply writing clang -O3 str.c

This doesn’t just apply in the case of high level language to low level language, but also includes things like transforming PHP7.0 code to Opcodes etc.

Types of code transformation

Interpreting

There are a ton of high level languages that use something called a Virtual Machine to interpret the high level representation of their code. PHP and Python are among these. The most popular of these you’ve probably heard about before is the JVM, which enables running Java (or things like Scala).

I won’t go much into interpreters since we’re mostly covering compilation in this post. However, the way it works is that the VM basically acts like a huge switch statement inside of a loop, and code is processed as such.

AOT Compilation

This section will talk about AOT (ahead of time) compilation and what it means. This is the most popular way to compile static languages today and covers languages such as Rust, Go etc.

AOT often generates extremely efficient code. In-fact, we can often write less efficient high-level code and let the compiler optimize for us at build time, an example of this can be found here where instead of comparing each byte, which one may assume would be faster, simply casting to a string and comparing is just as quick (and reads a lot more nicely).

AOT can often take a long time to compile code. However this is just due to the fact we’re using high level abstractions and trying compile down into efficient machine code, and these translations take time to run, especially when optimizing.

Also, we may be performing other non-trivial steps such as enforcing constraints that make the particular language safe at runtime.

JIT Compilation

In the grand scheme of complation, JIT has only become popular recently and includes both a VM AND AOT compilation and covers languages such as JavaScript, Lua etc.

The basic premise of this approach is to start running the code in a VM at first (big switch statement, as above) but take note of which parts of the code are considered “hot” (meaning that this code is most useful to compile to bytecode) and to collect information on types are being used most often.

What happens next is that it stops execution and compiles the pieces of code it found during the VM phase then continues execution in the style of a VM.

The main advantage to this approach is that you get the best of both worlds discussed above; rapid idea iteration without compilation - due to the VM, and AOT compilation for code that is run most often.

Conclusion

My goal for this post was to get a better overview of how the code I was writing day to day was being executed. I think this accomplishes that, alongside a quick dive into the different types of compilers out there. Overall, I’m excited to see what happens with the PHP-JIT RFC that has been accepted for PHP 8.0, and the FFI RFC which has also been accepted, but for PHP7.4.