Compiling, Projects, Libraries
When using C++ integrated development environments (IDEs) or
stand-alone compilers like g++, you'll have an easier time setting
things up for your students if you understand how the separate phases of
compiling and linking are used in creating an
executable program.
Understanding these phases will help you sort out the different options
for setting up environments for you and your students. This document
provides some explanation of how these phases work and how the
#include preprocessor directive works in creating libraries
and programs.
You'll need to have a reasonable grasp of these concepts no matter which
environment you use. All the major IDEs including Borland/Turbo C++,
Metrowerks Codewarrior, and Microsoft Visual C++ use the approach of
projects and libraries discussed here.
Compiling and Linking
To show the differences between compilation and linking, and to
illustrate the alternatives available in terms of libraries and header
files for student projects, we'll use a program that prints a
calendar for a month and year entered by the user. (See below for access
to the entire program, for the purposes of this explanation we only need
to look at the .h, or header files, that are included in the program).
The beginning of the program named calendar.cpp follows:
#include
using namespace std;
#include "date.h"
// make a calendar for any month in any year
Header files and #include
The first stage in compiling calendar.cpp is the use of the
preprocessor which executes before the compiler. In
calendar.cpp, the #include directives are handled by
the preprocessor before the compiler is invoked. The #include
command causes the preprocessor to literally cut and paste the
indicated header files, i.e., iostream and date.h
into the text/code that will be compiled. This means that the compiler
must compile all the declarations and code in iostream and
date.h in addition to the user-written code in
calendar.cpp. Most IDEs use the convention of angle-brackets
for system supplied header files, i.e., <iostream> and
quotes for user-supplied header files, i.e., "date.h".
In older examples, standard include files like iostream had a
.h suffix, e.g., iostream.h. The C++ standard says that the
name should be iostream and requires the using namespace
std directive after the include files that use the standard
namespace.
However, header/.h files typically consist only of a class's
interface, and not the implementation.
The implementation of a class is typically found in a file with the same
prefix (e.g., date), but with a suffix (.cpp) that indicates the file
consists of code or a class definition rather than the
class declaration that is found in the header file. To
create an executable calendar program, the code that implements the date
class, the string class, and the iostream classes must be combined with
the user-written code in calendar.cpp. This happens in two
phases: compiling and linking.
Compiling source to object
Source code, typically found in files with a .cpp suffix, often consists
of definitions for functions, classes, constants, and variables. This
code is compiled into an object file as the compiler
translates C++ source code into architecture-dependent object code. On
many machines object files have a .obj or a .o suffix. These files are
not executable by themselves, but represent the compiled form of the
code in a .cpp file. To create an executable program, several .obj
files must be linked together.
Linking object files
In our example of the calendar program, the file calendar.cpp
is compiled (after the preprocessor has executed) into an object file
calendar.obj. The implementation of the date class must be
part of the final executable program as must the implementation of the
string class and the iostream classes. Although you can combine these
implementations manually, typically the programming environment (IDE)
handles this for you by the use of a project. To create an executable
program called calendar.exe several object files must be
combined or linked together. These files are
date.obj, the object code that is the implementation of the
date class and apstring.obj if you're using the AP string class.
Finally,
the object code for the iostream classes and other classes you may not
realize are needed (to support, for example, a console application on
the computer you're using) must be linked too. In most programming
environments, the iostream classes and other support classes have .obj
files that are combined into a library, which is
a file that combines several .obj files into one file. Libraries
typically have .lib suffixes.
The process of including, compiling, and linking are illustrated in the
following diagram. A library file named xxx.lib is shown being
linked with the object code for the classes and program that are
combined to generate an executable named calendar.exe.
Note how some header files are included in each of the .cpp files. For
example, the Date class, whose interface is given in
date.h, uses the string class for some of its functions. This
means that #include"apstring.h" appears in the file
date.h --- the preprocessor pastes the code found in
apstring.h whenever date.h is included because
preprocessing is iterative: any files that are included have all the
files that they include processed as well (and so on and so on).
In the diagram above, the file apstring.h is shown included in
calendar.cpp because although #include "apstring.h"
does not appear directly in calendar.cpp, the file
apstring.h is included when date.h is included since
date.h has a line #include"apstring.h".
Using Projects
One way to ensure that all the right code is compiled and linked is to
put all the .cpp files that have code your program needs into a project.
Some environments, e.g., Metrowerks, require a project, in other
environments (e.g., Turbo 3.0) projects are optional. In our example, a
project will need three .cpp files to create an executable:
- calendar.cpp
- date.cpp
- apstring.cpp
The code for the iostream classes is typically linked automatically
since the compiler knows what kind of project you are creating, e.g., a
console application.
You can put all three .cpp files into your project, then
build or make the program which will
start the phases of compilation and linking to create an executable.
Access to header files
In some environments it is not enough to put .cpp files into a project,
you must also tell the environment/compiler where the header files you
use are found. This list of locations is called the include
path. You must set up the include path manually with
Borland/Turbo and Visual C++. Metrowerks usually infers the location of
header files automatically, but sometimes you'll need to tell it where
header files are located too.
Each environment uses a different method for setting up the include
path, an explanation of these methods can be found as part of the
discussion on creating projects. When
adding new directories/paths to the include path, be sure that you do
not erase the location of the system header files. If you get an error
message like:
cannot find file iostream
cannot find file date.h
typically this means that the environment's include path is not set up
properly, some directories are missing. Usually the order in which
directories appear in the include path is the order in which the
directories are searched for header files. Normally the order doesn't
matter, but sometimes you may have different versions of header files
and you'll find that the order does matter.
Alternatives in Using Projects
Requiring students to put all the .cpp files into a project will make
things difficult for beginning students who shouldn't be burdened with
knowing all the .cpp files they need. For example, in the program
calendar.exe illustrated above, the source code in calendar
.cpp doesn't make any direct references to strings. However,
because the Date class uses strings, the implementation of the
string class must be part of the project. It's difficult for beginning
students to see this and it's a chore for them to figure
out which .cpp files are needed in a project.
Fortunately there are several alternatives. These fall into two camps:
the use of a library and the inclusion of all source code via the
preprocessor. I prefer the library methods for reasons described
below. Using a library means students will always have two files in a
project, the source code for their program (e.g., calendar.cpp)
and the library that includes all .obj files they might ever need, e.g.,
string, date, etc. Some instructors prefer to use only one file in a
project: the student source code; or prefer not to use a project when
first starting out. These instructors use the approach of including all
source code via the preprocessor. Both methods are described below.
Using Libraries
In the library approach, all code that students might need to link with
their programs, e.g., the string and date class implementations, is
combined into a library. Recall that a library is simply a collection
of .obj files that are linked in creating an executable program. In
addition to the library of user-defined classes like string and date, a
library of system classes like the iostream classes will be linked to
create an executable.
Each IDE uses a different sequence of steps to create a library. In
general, a project is created, all the .cpp files that correspond to
classes and code you want students to access are put in the project, and
a library is created from the .obj files that are compiled from the .cpp
files. Students then include the library, which might be called
tapestry.lib, or apstuff.lib, in their projects.
Instructions for creating libraries using
the major IDEs/compilers are available. Before creating a library, it
helps to understand the material in this document so that if something
doesn't work, you'll understand conceptually what's going on and will be
better able to find a solution to the problem.
Using the all-include approach
Some instructors prefer this method. It is simpler in that a project
only has the .cpp file written by a student (a library file is needed
when using the library approach). However, compilation time is longer
with the all-include approach because whenever a student recompiles her
program, all the string, date, etc. source code (including .cpp files)
will be recompiled too. On fast machines the recompilation time is not
a problem.
There are two ways of having all the .cpp source code compiled into a
user's program using the all-include method.
- The last line of each .h file, e.g., foo.h is
#include"foo.cpp". This means that whenever the header
file foo.h is included, the source code in
foo.cpp will be included too since the preprocessor
pastes in all included files.
- A header file is created that does all the including. This
file might be called tapestry.h or apstuff.h.
The header file would like like this:
#ifndef _TAPESTRY_H
#define _TAPESTRY_H
#include "apstring.cpp"
#include "date.cpp"
...
...
#endif
Since each .cpp file includes the .h files, all the .h files
are included as are all the .cpp files, whenever
tapestry.h is included. You can even put
#include<iostream.h> in the tapestry.h
file. Then students can be told that the only line they need
in any program they write is #include"tapestry.h".
The diagram below shows the include/compile/link process when using the
first approach outlined above, i.e., when each .h file includes,
as the last line, the corresponding .cpp file.
Note that because calendar.cpp includes date.h, it
indirectly includes date.cpp since the last line of
date.h will be #include"date.cpp". Because
date.h includes apstring.h which in turn includes
apstring.cpp, all these files are included when
calendar.cpp is compiled into calendar.obj. Only the
system library xxx.lib is linked with calendar.obj to
create the executable.
This approach works well, but if the user makes a small change in
calendar.cpp (a 56 line program), all the code in
apstring.cpp and date.cpp (a total of more than 600
lines of code) will be recompiled too. Again, this isn't a problem on
fast machines.
The #ifndef preprocessor command
Note in the diagram above that iostream.h will be included
twice when calendar.cpp is compiled: once directly by
calendar.cpp and once indirectly since CPstring.h and
CPstring.cpp include iostream.h. This
multiple-inclusion can cause two problems: re-compiling the same
code more than once causes an error in many environments because the
same class or variable may be defined twice. In some situations,
file a includes file b which includes file
c, which eventually includes a again. This can
lead to an infinite loop of inclusion.
Typically, every header file begins with a #ifndef preprocessor
command. For date.h this is:
#ifndef _DATE_H
#define _DATE_H
#endif
When the preprocessor, which is responsible for the #include, includes
date.h for the first time, it processes the ifndef command
conceptually as follows:
if the symbol _DATE_H is NOT defined, then continue preprocessing, but
if the symbol is already defined, stop preprocessing this file.
If preprocessing continues, the first thing that happens is that the
symbol _DATE_H becomes defined.
Because the symbol is NOT defined only once (from that point on it
becomes defined), the same header file cannot be included more than once
during one run of the compiler. This prevents multiple inclusions of
the same file, or infinitely recursive inclusions.
Code
The calendar, date, and string code mentioned here are freely available
to those teaching computer science courses, whether using my textbook or
not. See the AP web
page or the book web
page
Owen L. Astrachan
Last modified: Tue Dec 12 12:10:18 EST 2000