Compiling, Projects, Libraries

When using C++ integrated development environments (IDEs) or stand-alone compilers like g++, you'll have an easier time setting things up for your students if you understand how the separate phases of compiling and linking are used in creating an executable program.

Understanding these phases will help you sort out the different options for setting up environments for you and your students. This document provides some explanation of how these phases work and how the #include preprocessor directive works in creating libraries and programs.

You'll need to have a reasonable grasp of these concepts no matter which environment you use. All the major IDEs including Borland/Turbo C++, Metrowerks Codewarrior, and Microsoft Visual C++ use the approach of projects and libraries discussed here.

Compiling and Linking

To show the differences between compilation and linking, and to illustrate the alternatives available in terms of libraries and header files for student projects, we'll use a program that prints a calendar for a month and year entered by the user. (See below for access to the entire program, for the purposes of this explanation we only need to look at the .h, or header files, that are included in the program). The beginning of the program named calendar.cpp follows:

#include <iostream> using namespace std; #include "date.h" // make a calendar for any month in any year

Header files and #include

The first stage in compiling calendar.cpp is the use of the preprocessor which executes before the compiler. In calendar.cpp, the #include directives are handled by the preprocessor before the compiler is invoked. The #include command causes the preprocessor to literally cut and paste the indicated header files, i.e., iostream and date.h into the text/code that will be compiled. This means that the compiler must compile all the declarations and code in iostream and date.h in addition to the user-written code in calendar.cpp. Most IDEs use the convention of angle-brackets for system supplied header files, i.e., <iostream> and quotes for user-supplied header files, i.e., "date.h".

In older examples, standard include files like iostream had a .h suffix, e.g., iostream.h. The C++ standard says that the name should be iostream and requires the using namespace std directive after the include files that use the standard namespace.

However, header/.h files typically consist only of a class's interface, and not the implementation. The implementation of a class is typically found in a file with the same prefix (e.g., date), but with a suffix (.cpp) that indicates the file consists of code or a class definition rather than the class declaration that is found in the header file. To create an executable calendar program, the code that implements the date class, the string class, and the iostream classes must be combined with the user-written code in calendar.cpp. This happens in two phases: compiling and linking.

Compiling source to object

Source code, typically found in files with a .cpp suffix, often consists of definitions for functions, classes, constants, and variables. This code is compiled into an object file as the compiler translates C++ source code into architecture-dependent object code. On many machines object files have a .obj or a .o suffix. These files are not executable by themselves, but represent the compiled form of the code in a .cpp file. To create an executable program, several .obj files must be linked together.

Linking object files

In our example of the calendar program, the file calendar.cpp is compiled (after the preprocessor has executed) into an object file calendar.obj. The implementation of the date class must be part of the final executable program as must the implementation of the string class and the iostream classes. Although you can combine these implementations manually, typically the programming environment (IDE) handles this for you by the use of a project. To create an executable program called calendar.exe several object files must be combined or linked together. These files are date.obj, the object code that is the implementation of the date class and apstring.obj if you're using the AP string class. Finally, the object code for the iostream classes and other classes you may not realize are needed (to support, for example, a console application on the computer you're using) must be linked too. In most programming environments, the iostream classes and other support classes have .obj files that are combined into a library, which is a file that combines several .obj files into one file. Libraries typically have .lib suffixes.

The process of including, compiling, and linking are illustrated in the following diagram. A library file named xxx.lib is shown being linked with the object code for the classes and program that are combined to generate an executable named calendar.exe.

*

Note how some header files are included in each of the .cpp files. For example, the Date class, whose interface is given in date.h, uses the string class for some of its functions. This means that #include"apstring.h" appears in the file date.h --- the preprocessor pastes the code found in apstring.h whenever date.h is included because preprocessing is iterative: any files that are included have all the files that they include processed as well (and so on and so on).

In the diagram above, the file apstring.h is shown included in calendar.cpp because although #include "apstring.h" does not appear directly in calendar.cpp, the file apstring.h is included when date.h is included since date.h has a line #include"apstring.h".

Using Projects

One way to ensure that all the right code is compiled and linked is to put all the .cpp files that have code your program needs into a project. Some environments, e.g., Metrowerks, require a project, in other environments (e.g., Turbo 3.0) projects are optional. In our example, a project will need three .cpp files to create an executable: The code for the iostream classes is typically linked automatically since the compiler knows what kind of project you are creating, e.g., a console application.

You can put all three .cpp files into your project, then build or make the program which will start the phases of compilation and linking to create an executable.

Access to header files

In some environments it is not enough to put .cpp files into a project, you must also tell the environment/compiler where the header files you use are found. This list of locations is called the include path. You must set up the include path manually with Borland/Turbo and Visual C++. Metrowerks usually infers the location of header files automatically, but sometimes you'll need to tell it where header files are located too.

Each environment uses a different method for setting up the include path, an explanation of these methods can be found as part of the discussion on creating projects. When adding new directories/paths to the include path, be sure that you do not erase the location of the system header files. If you get an error message like:

   cannot find file iostream
   cannot find file date.h

typically this means that the environment's include path is not set up properly, some directories are missing. Usually the order in which directories appear in the include path is the order in which the directories are searched for header files. Normally the order doesn't matter, but sometimes you may have different versions of header files and you'll find that the order does matter.

Alternatives in Using Projects

Requiring students to put all the .cpp files into a project will make things difficult for beginning students who shouldn't be burdened with knowing all the .cpp files they need. For example, in the program calendar.exe illustrated above, the source code in calendar .cpp doesn't make any direct references to strings. However, because the Date class uses strings, the implementation of the string class must be part of the project. It's difficult for beginning students to see this and it's a chore for them to figure out which .cpp files are needed in a project.

Fortunately there are several alternatives. These fall into two camps: the use of a library and the inclusion of all source code via the preprocessor. I prefer the library methods for reasons described below. Using a library means students will always have two files in a project, the source code for their program (e.g., calendar.cpp) and the library that includes all .obj files they might ever need, e.g., string, date, etc. Some instructors prefer to use only one file in a project: the student source code; or prefer not to use a project when first starting out. These instructors use the approach of including all source code via the preprocessor. Both methods are described below.

Using Libraries

In the library approach, all code that students might need to link with their programs, e.g., the string and date class implementations, is combined into a library. Recall that a library is simply a collection of .obj files that are linked in creating an executable program. In addition to the library of user-defined classes like string and date, a library of system classes like the iostream classes will be linked to create an executable.

Each IDE uses a different sequence of steps to create a library. In general, a project is created, all the .cpp files that correspond to classes and code you want students to access are put in the project, and a library is created from the .obj files that are compiled from the .cpp files. Students then include the library, which might be called tapestry.lib, or apstuff.lib, in their projects.

Instructions for creating libraries using the major IDEs/compilers are available. Before creating a library, it helps to understand the material in this document so that if something doesn't work, you'll understand conceptually what's going on and will be better able to find a solution to the problem.

Using the all-include approach

Some instructors prefer this method. It is simpler in that a project only has the .cpp file written by a student (a library file is needed when using the library approach). However, compilation time is longer with the all-include approach because whenever a student recompiles her program, all the string, date, etc. source code (including .cpp files) will be recompiled too. On fast machines the recompilation time is not a problem.

There are two ways of having all the .cpp source code compiled into a user's program using the all-include method.

The diagram below shows the include/compile/link process when using the first approach outlined above, i.e., when each .h file includes, as the last line, the corresponding .cpp file.

Note that because calendar.cpp includes date.h, it indirectly includes date.cpp since the last line of date.h will be #include"date.cpp". Because date.h includes apstring.h which in turn includes apstring.cpp, all these files are included when calendar.cpp is compiled into calendar.obj. Only the system library xxx.lib is linked with calendar.obj to create the executable.

*

This approach works well, but if the user makes a small change in calendar.cpp (a 56 line program), all the code in apstring.cpp and date.cpp (a total of more than 600 lines of code) will be recompiled too. Again, this isn't a problem on fast machines.

The #ifndef preprocessor command

Note in the diagram above that iostream.h will be included twice when calendar.cpp is compiled: once directly by calendar.cpp and once indirectly since CPstring.h and CPstring.cpp include iostream.h. This multiple-inclusion can cause two problems: re-compiling the same code more than once causes an error in many environments because the same class or variable may be defined twice. In some situations, file a includes file b which includes file c, which eventually includes a again. This can lead to an infinite loop of inclusion.

Typically, every header file begins with a #ifndef preprocessor command. For date.h this is:

#ifndef _DATE_H #define _DATE_H #endif When the preprocessor, which is responsible for the #include, includes date.h for the first time, it processes the ifndef command conceptually as follows:
if the symbol _DATE_H is NOT defined, then continue preprocessing, but if the symbol is already defined, stop preprocessing this file.

If preprocessing continues, the first thing that happens is that the symbol _DATE_H becomes defined.

Because the symbol is NOT defined only once (from that point on it becomes defined), the same header file cannot be included more than once during one run of the compiler. This prevents multiple inclusions of the same file, or infinitely recursive inclusions.

Code

The calendar, date, and string code mentioned here are freely available to those teaching computer science courses, whether using my textbook or not. See the AP web page or the book web page
Owen L. Astrachan
Last modified: Tue Dec 12 12:10:18 EST 2000