Social Icons

Thursday, November 28, 2013



We have already seen how to compile a program using a gcc compiler and each step involved in the process of compilation. The gcc is a compiler tool chain, i.e its a set of tools or a collection of tools. Gcc acts as a wrapper for all these tools. In the process of compilation we came across 4 steps on how gcc compiles a program into an executable binary output file. (refer my old article stages of compilation)

                                                       The first phase of the compilation process is the pre-processing phase where the given input program will pre-processed to generate a specific output which will be taken as input by the other tools involved in the compilation process. Lets now study about the pre-processor and learn how it processes the .c file.


Pre-processing can be generally as a process where we prepare any entity for some specific application so that it produces a better outcome. In relevance to computer science, pre-processing can be explained as a process where we take an input data or program make certain optimizations and  modifications
and produce an output which will be used by an other application.


The pre-processing phase is the first and foremost step in the stages of compilation. The pre-processor will process all the " # " derivatives in the program like #include, #define, #ifdef, #endif etc and substitute them with the necessary code. This is the general functionality of the pre-processor. 


The tool used by the gcc to perform the pre-processing operation is called as pre-processor. When we give a program as input to the gcc it will straight away convert the program into an binary executable but, if we can also instruct the gcc to stop after performing the first phase of compilation. 

                                                      Consider the below C program add.c. Its a very simple program for addition of two number.
The above program is a very simple program which will take two numbers a and b,  add the values and store the output in the variable c. The program has only a single print statement. If we are to compile this program normally we use the command gcc <<filename.c>>  and an executable binary file a.out will be a created.

                                           Let us perform only the first stage of compilation by instructing the gcc to perform only the pre-processing. Now the gcc invokes the pre-processor and then it stops. Now what we are going to do is simply type the following command below

command:   gcc <<filename.c>>   -E  -o  <<outputfilename.i>>.

Here the gcc is provided with a special flag " -E "  after the filename. This flag will instruct the gcc to invoke the pre-processor and create an outputfile with a " .i " extension. This extension specifies that its a pre-processed file.

Lets pre-process the code for adding the two numbers.

Here what I have done above is I pre-processed the file add.c and pushed the output into a file add1.i.
If you observe carefully the above screen-shot its clear that when I did an ls command its displaying a file called add.i in green font which is the pre-processed file.
                                                             The is a command called " file " which will take the file name as input and display the file type as the output.
Lets check the filetype of add1.i

 Here its showing that add1.i is  an ASCII text file, which means it can be opened using a regular text editor. 
lets open the  add1.i  pre-processes file using the vim text editor. The command to open add1.i is 
vim add1.i
cd temp

The above screen-shot displays only a small part of the entire file. I have scrolled down about 19% of the entire file which is shown at the below right end of the screen-shot. Check out the below video.

In this video I am showing you the entire pre-processed output of the add.c program. The pre-processor processed all the " # " derivatives i.e in this context the # include <stdio.h> line was processed by the pre-processor. At the end of the below clip I will clearly show you that only the # derivatives were only processed by the pre-processor and the rest of the code is completely untouched by the pre-processor.

 Hence we can conclude that pre-processor will only understand the # derivatives and it will process all the lines which starts with " # ". The contents of the header file stdio.h was replaced into the program as the compilers cant understand the header files. So, the pre-processor processes the macros for the compiler for compilation.


Well by now you might have understood what is the functionality of the pre-processor. Let us now learn how the pre-processor looks for the file and what are conditions are evaluated by the pre-processor.
There is a flag called " -v " called verbose. Most of the commands  can work with this flag. This will give us the verbose output along with the output of the command. The verbose output means the compiler will give comments regarding what all operations it performed while generating the output.
                                                            By using this -v flag we can check what the pre-processor is actually doing while generating the pre-processed file.

command:     gcc add.c -E -v -o add1.i

 The output generated by the above command will look like this.

  This is what the pre-processor does while building the pre-processed output of a c program. Now I know its looking very messy and difficult to follow. The first operation the pre-processor does is, it reads a file called  " SPECS ". 
  It is the specifications file which will tell the pre-processor what tasks to perform. The next step gcc does it will look for the target platform i.e for which platform and processor it should start building the iouput. Here in my system its showing i686-linux-gnu, which is my architecture. I hope I don't have to place a screen-shot of that line. Its specified in the fourth line. The next few lines indicate that the machine is verifying all the system dependent components and libraries required. Now when all the verifications and validations are over it will invoke the pre-processor tool called " CC1 ". 
                                                                 CC1 is the pre-processor tool which will convert all the macros and replace the content of the header files into the add1.i file. In the below screen shot I have highlighted the CC1 tool.

The gcc will take the flags specified by us manually and the rest of the flags will be passed by the gcc implicitly. If you carefully look, you can find the compiler path which indicated where the compiler will search for the libraries. I will highlight some of the predefined library paths which is displayed in the output.

I hope I completed all the major topics related to the pre-processor. If I happen to learn anything new about the pre-processor then, I will surely add that to this article and will notify everyone about it.

No comments:

Post a Comment