Social Icons

Monday, December 2, 2013

UNDERSTAND COMPILER.............ARTICLE 14

COMPILER

In my last post I have discussed about the pre-processor and how it works. Lets have a look on how the compiler is playing the major part in the process of compilation. Here when I am referring to the terms compilation and compiler, they are not the same. Compilation is the process of converting the source code of the program and converting it into a binary executable file. 
                                                       
                                                             Compiler is the tool we use to convert the pre-processed code into assembly instructions. If you have not referred my last post on preprocessor I recommend you to read that article first. This article is a continuation of the previous article.

WHAT IS THE COMPILER???

A compiler is a tool. In the compilation process we will come across 4 major phases. The compiler is the tool that is used in the second phase of compilation. It will convert the pre-processed output generated by the pre-processor into assembly instruction set.

HOW THE COMPILER WORKS???

By now I am assuming whoever is reading this is already familiar with the first phase of compilation i.e the pre-processor and how the pre-processor works. The compiler comes into action in the second phase of compilation. The output generated by the pre-processor will have a " .i " extension to it. A file with a  ' .i ' extension states that its a pre-processed code. 

For example:  anyfilename.i  means that its a pre-processed output.

                                           The compiler will take the pre-processed output of the pre-processor and convert it into assembly instructions which is an intermediate code. (not purely machine level code) 
 In the last article I have written a program to add two numbers and  pre-processed the sourcecode which generated an output add1.i. As I mentioned earlier its a pre-processed output which will be the input to the compiler.

INVOKING THE COMPILER:

Here I will first write the command to perform the second phase of compilation such that it will stop after the  second step.

                                            gcc  -S  app1.i   -o   app2.s

Here I am instructing the gcc that invoke the compiler tool. The " -S " flag  instructs the gcc to invoke the compiler tool and take the add1.i as the input for the compiler. The compiler will now process the add1.i  pre-processed file and convert it into an assembly instructions . The assembly code will be pushed into a file called " add2.s ". This file is the container of the assembly code and has an extension " .s " which indicates that the file contains the assembly code inside the file.

check out the below screenshots:
  This means that in my temp directory I have the source code file add.c and the pre-processed output add1.c. Now lets invoke the compiler and convert the add1.i into an assembly instruction set add2.s

 Now I want you to clearly observe the above screen shot. I have executed the command to perform the second phase of the compilation and then executed a command to display the list of files in my directory. Now I have three file, add2.s is now added to the directory temp.

The add2.s is the assembly output given by the gcc. Lets use the " -v " flag to get the verbose output and know exactly the approach of the compiler tool.

command:    gcc  -S  -v  app1.i  -o  app2.s

When I execute the above command it will display the approach taken by the compiler tool to generate the assembly instructions.


 After executing the above command we will get an output like this.


What the Gcc does is initially it will read a file called " specs " which we can refer as the specifications. This file is automatically installed when the Ubuntu is installed. It tells the compiler where to look and what to look for. The next step is the target. Gcc will check  for which platform it is generating the code. Here in my machine its showing  " Target: i686-linux-gnu "

After this step there will be some verifications and validation performed by the gcc and the gcc will look for the options or flags or switches we have provided to it. 

COLLECT_GCC_OPTIONS='-S' '-v' '-o' 'add2.s' '-mtune=generic' '-march=i686'

The additional flags which are required will be provided by the gcc itself.

Observe carefully the below screen shot.
 
 
gcc will now call the tool called " cc1". cc1 is the tool which will compile the pre-processed code into an assembly code. The tool cc1 will perform both the pre-processing as well as the compiler operations. The operation it needs to perform will depend on the flag we provide to it. If we give the  
" -E " flag, cc1 will perform the pre-processing and if we give the " -S " flag, cc1 will act as a compiler. One more important thing we need to observe is that its showing that the file is already pre-processed. I have clearly highlighted it. 
                                                     
                                                            The next step it does is it will load all the dependent components and will convert the per-processed code into an assembly code.

LETS OPEN THE ASSEMBLY FILE add2.s

file is a command we can use to determine the type of file. The syntax is very simple file <<filename>>

Its showing that the add2.s is an ASCII PROGRAM TEXT, which means that it can be opened using normal text editors. We will use the Vim editor to open the file.. (SYNTAX   VIM  <<FILENAME>>)

 The assembly instruction the gcc generates is specific to the architecture. My machine is using the x86 architecture hence gcc will generate the assembly output for my specific architecture. If we use any other architecture then the gcc will generate the assembly for that specific architecture. 

                                                The reason why the assembly output is architecture specific is because all the architectures may or may not have the same set of registers and same number of registers. The processor will only understand its specific instruction set. The registers in arm architecture and the x86 will be different. Hence the compiler will generate the assembly code specific to the architecture. 

Observe the assembly output carefully. You can find the main function as 
" main: ".
At this point it will be difficult for you to follow and understand the assembly code if you are new to this. I will write an article in future on how to understand the assembly code.

No comments:

Post a Comment