Social Icons

Thursday, July 17, 2014

HOW A PROGRAM EXECUTES WHEN LOADED INTO MEMORY (PART 1).............ARTICLE 32

We have learned about how a program compiles, which tools are internally involved during the process of compilation, what the compiled code comprises of, what is runtime code & what is a PCB..........

Now we will use the understanding of the above concepts and understand how a program will get executed.

A code comprises of 'n' number of functions. Each of these function can have many variable in them. These variables will be allocated addresses when the function is running. Only the global variables are allocated addresses during the compile time. So, the compiler does not know how many variables are present inside the functions. The variables which are inside the function are called the local variables. Before the function starts executing the we need to know how much data the function contains. We need additional memory where we can store the local variables of the function. You cannot determine during the compile time how much memory the local variables require as the local variables are allocated the addresses only during the runtime.

I will give you a better clarity in this concept through an example. Before the example I want to explain about a data-structure called the symbol table.

SYMBOL TABLE:

A symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location. 
                                       Hence using the symbol table and analyzing the contents of it we can easily check weather the global variable are assigned addresses or not. If the global variables are assigned an address during the compile time, the symbol table will contain the global variable.

In Linux, we have a tool called "nm" which will display the symbol tables in the object file. Lets check the man page of this tool.
The man page clearly says that this command is used to list symbols from object files. I would suggest all the readers to go through the man page in your linux system. Online man pages are also available, just google it.

Now, we will write a small program where we will declare three variables as global variables and three local variables.
In this program we have 8 variables in which four are global variables and four are local variables. In the global variables we have one variable initialized. When I request a symbol table for the executable file of this program we can observe that all the local variables will be assigned an address where as all the local variable will not have any addresses assigned.

This is the symbol table for the above program. You can see three columns in the output. The first column displays the address of the identifier. The second column displays the type of the identifier. Each type of identifier is identified with different letters.


 The third column is a list of the variables in the object files.
 For example: "A" means its the absolute value and will not be changed in further linking. There are many other letters used to identify different types of symbols. Refer the man pages for the further details

In the above symbol table if you carefully observe the variable a, b, c, d are allocated addresses. The variables b, c, and d are not initialized and hence they fall under type "B" which means the symbol is uninitialized and falls in the BSS segment. We have initialized the symbol ' a ' with the value 10 and hence it is of type "D" which means the symbol is in the initialized data section. With this we can clearly figure out that only the global variables are allocated the address during the compilation process.

So, coming back to how an executable file executes, the exe file has only the info about how much global variables we have, and how much code we have. It does not have the information of how many function the code has and each functions has what amount of data until the code starts executing or until the runtime.

Now the runtime code comes into the picture. This is one of the reasons why we require the runtime code. After the code is loaded into the memory and before the program starts executing it will create a block of memory called the stack. Stack is to allocate the data at runtime. Hence the main function of the runtime code or the start up code is to initialize the stack. Executable image only provides the information about what we have declared. Stack is not a part of the executable file.

FUNCTIONS OF THE RUNTIME ROUTINES(_init, _start, _fini):

When an application starts executing it begins with runtime routines. Runtime code is responsible for carrying out following operations.

1) Allocation of the stack segment in the process address space. 
2) Validating and setting up argument stack using command line parameters.
3) Invoking the start function of programming functionality.

When the program starts executing the first function to run is the __init() function. Runtime code means stack. init means initialize.   main() method receives the arguments means, someone must pass these arguments. This runtime code allocates or initializes the stack, receives the command line arguments form the terminal and pass them to the main function. 

                                     The runtime code is implemented to call the main() function. It assumes that the "main" is the first function to be executed. Hence it always start the program with a main. I will try to explain it conceptually first and then I would use an example. The __init() function will call the __start() function. The __start() will call the main() function. When the main() returns the __start() will call the __fini() which would terminate and perform a cleanup operation.

void  __start()
{
    
                                                // in the start rountine
        main();                           // calling the main function.
        __fini();                         // calling the fini function


int main()
{

           printf("main function called by ");
           return 0;

int __fini()
{
     printf("perform the clean up");
     return 0;
}


If you observe the above code, the  function calls are as follows.
__init() ----> __start() ----> main() ---->executes and returns to __start() ----> __fini() ----> executes and returns to __start() ----> __start() returns to its caller __init().

Now you understand why the we use the main function in our programs. Its because, the __start() function is implemented to call the main() function. We can change this general technique in Gcc compiler.

No comments:

Post a Comment