 |
Chapter 13. Programming
Languages
There's much more to Linux than simply using the system. One of the
benefits of free software is that it can be modified to suit the
user's needs. This applies equally to the many free applications
available for Linux and to the Linux kernel itself.
Linux supports an advanced programming interface, using
GNU compilers and tools, such as the
gcc compiler, the gdb debugger,
and so on. A number of other programming languages, including Perl,
Tcl/Tk, and LISP, are also supported. Whatever your
programming needs, Linux is a great choice for developing
Unix applications. Because the complete source code
for the libraries and Linux kernel are provided, those programmers who
need to delve into the system internals are able to do so.[46]
[46]On a variety of Unix
systems, the authors have repeatedly found available documentation to be
insufficient. With Linux, you can explore the very source code for the
kernel, libraries, and system utilities. Having access to source code
is more important than most programmers would think.
Linux is an ideal platform for developing software to run under the X
Window System. The Linux X distribution, as described
in Chapter 10, "Installing the X
Window System", is a
complete implementation with
everything you need to develop and support X applications.
Programming for X itself is portable across applications, so the
X-specific portions of your application should compile cleanly on other
Unix systems.
In this chapter, we'll explore the Linux programming environment and
give you a five-cent tour of the many facilities it provides. Half of the
trick to Unix programming is knowing what tools are
available and how to use them effectively. Often the most useful
features of these tools are not obvious to new users.
Since C programming has been the basis of most large projects (even
though it is nowadays being replaced more and more by C++) and is the
language common to most modern programmers--not only on
Unix, but on many other systems as
well--we'll start out telling you what tools are
available for that. The first few sections of the chapter assume you
are already a C programmer.
But several other tools are emerging as important resources,
especially for system administration. We'll examine two in this
chapter: Perl and Tcl/Tk. They are both scripting languages like the
Unix shells, taking care of grunt work like memory allocation, so you
can concentrate on your task. But both Perl and Tcl/Tk offer a degree
of sophistication that makes them more powerful than shell scripts
and appropriate for many programming tasks.
Lots of programmers are excited about trying out Java, the new
language from Sun Microsystems. While most people associate Java with
interactive programs (applets) on web pages, it is actually
a general-purpose language with many potential Internet uses. In a
later section, we'll
explore what Java offers above and beyond older programming languages
and how to get started.
13.1. Programming with gcc
The C programming language is by far the most used in
Unix software development. Perhaps this is because
the Unix system itself was originally developed in
C; it is the native tongue of Unix.
Unix C compilers have traditionally defined the
interface standards for other languages and tools, such as linkers,
debuggers, and so on. Conventions set forth by the original C
compilers have remained fairly consistent across the
Unix programming board. To know the C compiler is
to know the Unix system itself. Before we get too
abstract, let's get to details.
The GNU C compiler, gcc, is one
of the most versatile and advanced compilers around. Unlike other C
compilers (such as those shipped with the original AT&T or
BSD distributions or from various third-party
vendors), gcc supports all the modern C standards
currently in use--such as the ANSI C
standard--as well as many extensions specific to
gcc itself. Happily, however,
gcc provides features to make it compatible with
older C compilers and older styles of C programming. There is even a
tool called protoize that can help you write
function prototypes for old-style C programs.
gcc is also a C++ compiler. For those of you who
prefer the obscure object-oriented environment, C++ is supported with
all of the bells and whistles--including AT&T 3.0 C++ features, such as method
templates. Complete C++ class libraries are provided as well, such as
the iostream library familiar to many
programmers.
For those with a taste for the particularly esoteric,
gcc also supports Objective-C, an object-oriented C
spinoff that never gained much popularity. But the fun doesn't stop
there, as we'll see.
There's also a new kid on the block,
egcs. egcs is not a completely
new compiler, but is based on
gcc. egcs has some advanced
optimization features and is especially strong when it comes to newer
C++ features like templates and namespaces. If you are going to do
serious C++ programming, you will probably want to check it out. Alas,
there is a problem with Version
2.0.x kernels that prevents them from being compiled
with egcs. Newer kernels from the
2.1.x and those from the
2.2.x don't have this problem. But because
of this, some distributors have opted to include the traditional
gcc for C compilation and egcs
for C++. You can read more about egcs at
http://egcs.cygnus.com.
The Free Software Foundation has recently announced that
egcs will become their default compiler, thus
replacing egcs' own ancestor gcc.
In this section, we're going to cover the use of gcc to
compile and link programs under Linux. We assume you are
familiar with programming in C/C++, but we don't assume you're
accustomed to the Unix programming
environment. That's what we'll introduce here.
13.1.1. Quick Overview
Before imparting all of the gritty details of gcc itself,
we're going to present a simple example and walk through the steps
of compiling a C program on a Unix
system.
Let's say you have the following
bit of code, an encore of the much-overused "Hello, World!" program
(not that it bears repeating):
#include <stdio.h>
int main() {
(void)printf("Hello, World!\n");
return 0; /* Just to be nice */
}
To compile this program into a living, breathing executable, there are
several steps. Most of these steps can be accomplished
through a single gcc command, but the specifics are left for
later in the chapter.
First, the gcc compiler must generate an object file from
this source code. The object file is essentially the
machine-code equivalent of the C source. It contains code to
set up the main(
) calling stack, a call to the mysterious
printf( ) function, and code to return the value of 0.
The next step is to link the object file to produce an executable.
As you might guess, this is done by the linker.
The job of the linker is to take object files, merge them with code from
libraries, and spit out an executable. The object code from the previous
source does not make a complete executable. First and foremost, the
code for printf() must be linked in. Also, various initialization
routines, invisible to the mortal programmer, must be appended to the
executable.
Where does the code for printf() come
from? Answer: the libraries. It
is impossible to talk for long about gcc without making mention
of them. A library is essentially a collection of many object files,
including an index. When searching for the code for printf(),
the linker looks at the index for each library it's been told to link
against. It finds the object file containing the printf() function
and extracts that object file (the entire object file, which
may contain much more than just the printf() function) and links
it to the executable.
In reality, things are more complicated than this. As we have said, Linux
supports two kinds of libraries: static and shared. What we have
described in this example are static libraries: libraries where the actual code
for called subroutines is appended to the executable. However, the code for
subroutines such as printf() can be quite lengthy. Because many
programs use common subroutines from the libraries, it doesn't make sense
for each executable to contain its own copy of the library code. That's
where shared libraries come in.
With shared libraries, all of the common subroutine code is contained
in a single library "image file" on disk. When a program is linked
with a shared library, stub code
is appended to the executable, instead of
actual subroutine code. This stub code tells the program loader
where to find the library code on disk, in the image file,
at runtime. Therefore, when our friendly "Hello, World!"
program is
executed, the program loader notices that the program has been linked
against a shared library. It then finds the shared library image and
loads code for library routines, such as
printf(), along with the
code for the program itself. The stub code tells the loader where
to find the code for printf() in the image file.
Even this is an oversimplification of what's really going on. Linux shared
libraries use jump tables that allow the libraries to be upgraded
and their contents to be jumbled around, without requiring the executables
using these libraries to be relinked. The stub code in the executable actually looks up another reference in the library itself--in
the jump table. In this way, the library contents and the corresponding
jump tables can be changed, but the executable stub code can remain the
same.
But don't allow yourself to be befuddled by all this abstract
information. In time, we'll approach a real-life example and show you
how to compile, link, and debug your programs. It's actually very
simple; most of the details are taken care of for you by the gcc
compiler itself. However, it helps to have an understanding of what's
going on behind the scenes.
13.1.2. gcc Features
gcc has more features than we could possibly enumerate here.
Later, we present a short list and refer the curious
to the gcc manual page and Info document, which will undoubtedly give you
an eyeful of interesting information about this compiler. Later in this
section, we'll give you a comprehensive overview of the most useful gcc
features to get you started. This in hand, you should be able to
figure out for yourself how to get the many other facilities to work to
your advantage.
For starters, gcc supports the "standard" C syntax currently
in use, specified for the most part by the ANSI C standard. The most
important feature of this standard is function prototyping. That is,
when defining a function foo(), which returns an int and
takes two arguments, a (of type char *) and b (of
type double),
the function may be defined like this:
int foo(char *a, double b) {
/* your code here... */
}
This is in contrast to the older, nonprototype function definition
syntax, which looks like:
int foo(a, b)
char *a;
double b;
{
/* your code here... */
}
which is also supported by gcc.
Of course, ANSI C defines many other conventions, but this is the
one most obvious to the new programmer. Anyone familiar with
C programming style in modern books, such as the second edition of
Kernighan and Ritchie's The C Programming Language,
can program using gcc with no problem. (C compilers shipped
on some other Unix systems do not support ANSI features such as prototyping.)
The gcc compiler boasts quite an impressive optimizer. Whereas most
C compilers allow you to use the single switch -O to specify
optimization, gcc supports multiple levels of optimization.
At the highest level of optimization, gcc pulls tricks out of
its sleeve such as allowing code and static data to be shared. That is,
if you have a static string in your program such as Hello, World!,
and the ASCII encoding of that string happens to coincide with a sequence
of instruction code in your program, gcc allows the string
data and the corresponding code to share the same storage. How's that for
clever?
Of course, gcc allows you to compile debugging information into
object files, which aids a debugger (and hence, the programmer) in tracing
through the program. The compiler inserts
markers in the object file, allowing the debugger to locate specific
lines, variables, and functions in the compiled program. Therefore, when
using a debugger, such as gdb (which we'll talk about later in the
chapter), you can step through the compiled
program and view the original source text simultaneously.
Among the other tricks offered by gcc is the ability to generate assembly
code with the flick of a switch (literally). Instead of telling
gcc to compile your source to machine code, you can ask it to
stop at the assembly-language level, which is much easier for humans
to comprehend. This happens to be a nice way to learn the intricacies
of protected-mode assembly programming under Linux: write some C
code, have gcc translate it into assembly language
for you, and study
that.
gcc includes its own assembler (which can be used independently of
gcc), just in case you're wondering
how this assembly-language code might get assembled. In fact, you can
include inline assembly code in your C source, in case you need to
invoke some particularly nasty magic but don't want to write exclusively
in assembly.
13.1.3. Basic gcc Usage
By now, you must be itching to know how to invoke all these wonderful
features. It is important, especially to novice Unix and C programmers,
to know how to use gcc effectively.
Using a command-line compiler such as gcc is quite different from,
say, using a development system such as Borland C under MS-DOS.
Even though the language syntax itself is similar, the methods used to
compile and link programs are not at all the same.
Let's return to our innocent-looking "Hello, World!" example.
How would you go about compiling and linking this program?
The first step, of course, is to enter the source code. This is
accomplished with a text editor, such as Emacs or vi.
The would-be programmer should enter the source code and save it
in a file named something like hello.c. (As with most C compilers,
gcc is picky about the filename extension; that is, how it
can distinguish C source from assembly source from object files, and
so on. The .c extension should be used for standard C source.)
To compile and link the program to the executable hello, the
programmer would use the command:
papaya$ gcc -o hello hello.c
and (barring any errors), in one fell swoop, gcc
compiles the
source into an object file, links against the appropriate libraries, and
spits out the executable hello, ready to run. In fact, the wary
programmer might want to test it:
papaya$ ./hello
Hello, World!
papaya$
As friendly as can be expected.
Obviously, quite a few things took place behind the scenes when
executing this single gcc command. First of all, gcc
had to compile your source file, hello.c, into an object file,
hello.o. Next, it had to link hello.o against the
standard libraries and produce an executable.
By default, gcc assumes that not only do you want to compile the
source files you specify but also that you want them linked
together (with each other and with the standard libraries) to
produce an executable.
First, gcc compiles any source files into object files.
Next, it automatically invokes the linker to glue all
the object files and libraries into an executable.
(That's right, the linker is a separate program,
called ld, not part of gcc itself--although it can be said
that gcc and ld are close friends.) gcc also knows
about the "standard" libraries used by most programs and tells
ld to link against them. You can, of course, override these defaults
in various ways.
You can pass multiple filenames in one gcc command, but on
large projects you'll find it more natural to compile a few files at a
time and keep the .o object files around.
If you want only to compile a source file into an object file and
forego the linking process, use the -c switch with gcc, as
in:
papaya$ gcc -c hello.c
This produces the object file hello.o and nothing else.
By default, the linker produces an executable
named, of all things, a.out. By using the -o switch with
gcc, you can force the resulting executable to be named something
different, in this case, hello. This is just a bit of left-over
gunk from early implementations of Unix, and nothing to write home about.
13.1.4. Using Multiple Source Files
The next step on your path to gcc enlightenment is to understand
how to compile programs using multiple source files.
Let's say you have a program consisting of two source files,
foo.c and bar.c. Naturally, you would use one or more
header files (such as foo.h) containing function declarations
shared between the two programs. In this way, code in foo.c
knows about functions in bar.c, and vice versa.
To compile these two source files and link them together (along with the
libraries, of course) to produce the executable baz, you'd
use the command:
papaya$ gcc -o baz foo.c bar.c
This is roughly equivalent to the three commands:
papaya$ gcc -c foo.c
papaya$ gcc -c bar.c
papaya$ gcc -o baz foo.o bar.o
gcc acts as a nice frontend to the linker and other
"hidden" utilities invoked during compilation.
Of course, compiling a program using multiple source files in one command
can be time consuming. If you had, say, five or more source files in
your program, the gcc command in the previous example would
recompile each
source file in turn before linking the executable. This can be a large waste
of time, especially if you only made modifications to a single source file
since last compilation. There would be no reason to recompile the other
source files, as their up-to-date object files are still intact.
The answer to this problem is to use a project manager such as
make. We'll talk about make later in the chapter,
in the section "Section 13.2, "Makefiles"."
13.1.5. Optimizing
Telling gcc to optimize your code as it compiles is a simple matter;
just use the -O switch on the gcc command
line:
papaya$ gcc -O -o fishsticks fishsticks.c
As we mentioned not long ago, gcc supports different levels of
optimization. Using -O2 instead of -O will turn on
several "expensive" optimizations that may cause compilation to
run more slowly but will (hopefully) greatly enhance performance of
your code.
You may notice in your dealings with Linux that a number of
programs are compiled using the switch -O6 (the Linux kernel being
a good example). The current version of gcc does not support
optimization up to -O6, so this defaults to (presently)
the equivalent of -O2. However, -O6 is sometimes
used for compatibility with future versions of gcc to ensure that
the greatest level of optimization is used.
13.1.6. Enabling Debugging Code
The -g switch to gcc turns on debugging code in your compiled
object files. That is, extra information is added to the object file,
as well as the resulting executable, allowing the program to be
traced with a debugger such as gdb (which we'll get to later in
the chapter--no worries). The downside to using debugging code is that it
greatly increases the size of the resulting object files. It's usually
best to use -g only
while developing and testing your programs and to leave it out for the
"final" compilation.
Happily, debug-enabled code is not incompatible with code optimization.
This means that you can safely use
the command:
papaya$ gcc -O -g -o mumble mumble.c
However, certain optimizations enabled by -O or -O2 may cause
the program to appear to behave erratically while under the guise of a
debugger. It is usually best to use either -O or -g, not both.
13.1.7. More Fun with Libraries
Before we leave the realm of gcc, a few words on linking and
libraries are in order. For one thing, it's easy for you to create your
own libraries. If you have a set of routines you use often, you
may wish to group them into a set of source files, compile each
source file into an object file, and then create a library from the object
files. This saves you having to compile these routines
individually for each program you use them in.
Let's say you have a set of source files containing oft-used
routines, such as:
float square(float x) {
/* Code for square()... */
}
int factorial(int x, int n) {
/* Code for factorial()... */
}
and so on (of course, the gcc standard libraries provide analogs
to these common routines, so don't be misled by our choice of example).
Furthermore, let's say that the code for square() is in the file
square.c and that the code for factorial() is in
factorial.c. Simple enough, right?
To produce a library containing these routines, all that you
do is compile each source file, as so:
papaya$ gcc -c square.c factorial.c
which leaves you with square.o and factorial.o.
Next, create a library from the object files. As it turns out,
a library is just an archive file created using ar (a close
counterpart to tar).
Let's call our library libstuff.a and create it this way:
papaya$ ar r libstuff.a square.o factorial.o
When updating a library such as this, you may need to delete the old
libstuff.a, if it exists.
The last step is to generate an index for the library, which enables the
linker to find routines within the library. To do this, use the
ranlib command, as so:
papaya$ ranlib libstuff.a
This command adds information to the library itself; no separate index
file is created. You could also combine the two steps of running
ar and ranlib by using the
s command to ar :
papaya$ ar rs libstuff.a square.o factorial.o
Now you have libstuff.a, a static library containing your routines.
Before you can link programs against it, you'll need to create a header
file describing the contents of the library. For example, we could create
libstuff.h with the contents:
/* libstuff.h: routines in libstuff.a */
extern float square(float);
extern int factorial(int, int);
Every source file that uses routines from
libstuff.a should contain an
#include "libstuff.h" line, as you would do with
standard header files.
Now that we have our library and header file, how do we compile programs
to use them? First of all, we need to put the library and header file
somewhere the compiler can find them. Many users place personal
libraries in the directory lib in their home directory, and
personal include files under include. Assuming we have done
so, we can compile the mythical program wibble.c using the command:
papaya$ gcc -I..
/include -L..
/lib -o wibble wibble.c -lstuff
The -I option tells gcc to add the
directory ..
/include
to the include path it uses to search for include files.
-L is similar, in that it tells gcc to add the directory
..
/lib to the library path.
The last argument on the command line is -lstuff, which tells
the linker to link against the library libstuff.a (wherever it
may be along the library path). The lib at the beginning of the
filename is assumed for libraries.
Any time you wish to link against libraries other than the standard
ones, you should use the -l switch on the gcc command line.
For example, if you wish to use math routines (specified in math.h),
you should add -lm to the end of the
gcc command, which links against
libm. Note, however, that the
order of
-l options is significant.
For example, if our libstuff library
used routines found in libm, you must include
-lm after -lstuff on the
command line:
papaya$ gcc -Iinclude -Llib -o wibble wibble.c -lstuff -lm
This forces the linker to link libm after libstuff,
allowing those unresolved references in libstuff to be taken
care of.
Where does gcc look for libraries? By default, libraries are
searched for in a number of locations, the most important of which is
/usr/lib. If you take a glance at the contents of /usr/lib,
you'll notice it contains many library files--some of which have filenames
ending in .a, others ending in .so.version.
The .a files are static libraries, as is the case with our
libstuff.a. The .so files are shared libraries, which contain code to be linked at runtime, as well as the stub
code required for the runtime linker (ld.so) to locate
the shared library.
At runtime, the program loader looks for shared
library images in several places, including /lib. If you look at
/lib, you'll see files such as libc.so.5.4.47. This is the image
file containing the code for the libc shared library (one of the
standard libraries, which most programs are linked against).
By default, the linker attempts to link against shared libraries.
However, there are several cases in which static libraries are used.
If you enable debugging code with -g, the program will be linked
against the static libraries. You can also specify that static libraries
should be linked by using the -static switch
with gcc.
13.1.7.1. Creating shared libraries
Now that you know how to create
and use static libraries, it's very easy to make the step to
shared libraries. Shared libraries have a number of
advantages. They reduce memory consumption if used by more
than one process, and they reduce the size of the
executable. Furthermore, they make developing easier: when
you
use shared libraries and change some things in a library,
you do not need to recompile and relink your application
each time. You need to relink only if you make incompatible
changes, such as adding arguments to a call or changing the
size of a struct.
Before you start doing all your development work with
shared libraries, though, be warned that debugging with
them is slightly more difficult than with static
libraries, because the debugger usually used on Linux,
gdb, has some problems with shared
libraries.
Code that goes into a shared library needs to be
position independent. This is a just
a convention for object code that makes it possible
to use the code in shared libraries. You make
gcc emit
position-independent code by passing it one of the command-line
switches -fpic or
-fPIC (the former is preferred,
unless the modules have grown so large that the relocatable
code table is simply too small in which case the compiler
will emit an error message, and you have to use
-fPIC. To repeat our example
from the last section:
papaya$ gcc -c -fPIC square.c factorial.c
This being done, it is just a simple step to generate
a shared library:[47]
[47]In the ancient days of Linux, creating a shared
library was a daunting task that even wizards were
afraid of. The advent of the ELF object-file format a
few years ago has reduced this task to picking the right
compiler switch. Things sure have improved!
papaya$ gcc -shared -o libstuff.so square.o factorial.o
Note the compiler switch
-shared. There is no indexing step
as with static libraries.
Using our newly created shared library is even
simpler. The shared library doesn't require any change to
the compile command:
papaya$ gcc -I../include -L../lib -o wibble wibble.c -lstuff -lm
You might wonder what the linker does if there is a
shared library libstuff.so and a static
library llibstuff.a available. In this
case, the linker always picks the shared library. To make it
use the
static one, you will have to name it explicitly on
the command line:
papaya$ gcc -I../include -L../lib -o wibble wibble.c libstuff.a -lm
Another very useful tool for working
with shared libraries is ldd. It tells you
which shared libraries an executable program uses. Here's an
example:
papaya$ ldd wibble
libstuff.so => libstuff.so (0x400af000)
libm.so.5 => /lib/libm.so.5 (0x400ba000)
libc.so.5 => /lib/libc.so.5 (0x400c3000)
The three fields in each line are the name of the
library, the full path to the instance of the library that
is used, and where in the virtual address space the library
is mapped to.
If ldd outputs not
found for a certain library, you are in trouble
and won't be able to run the program in question. You will
have to search for a copy of that library. Perhaps it is a
library shipped with your distribution that you opted not to
install, or it is already on your hard
disk, but the loader (the part of the system that loads
every executable program) cannot find it.
In the latter situation, try locating the libraries yourself and
find out whether they're in a nonstandard directory. By default,
the loader looks only in /lib and
/usr/lib. If you have libraries in
another directory, create an environment variable
LD_LIBRARY_PATH and add the
directories separated by colons.
13.1.8. Using C++
If you prefer object-oriented programming, gcc provides complete
support for C++ as well as Objective-C. There are only a few considerations
you need to be aware of when doing C++ programming with gcc.
First of all, C++ source filenames should end in the extension .C
or .cc. This distinguishes them from regular C source filenames,
which end in .c.
Second, the g++ shell script should be used in lieu of gcc
when compiling C++ code. g++ is simply a shell script that invokes
gcc with a number of additional arguments, specifying a link against
the C++ standard libraries, for example. g++ takes the same arguments
and options as gcc.
If you do not use g++, you'll
need to be sure to link against the C++ libraries in order to use any of
the basic C++ classes, such as the cout and cin I/O objects.
Also be sure you have actually installed the C++ libraries and
include files. Some distributions contain only the standard C libraries.
gcc will be able to compile your C++ programs fine, but without
the C++ libraries, you'll end up with linker errors whenever you attempt
to use standard objects.
 |  |  | | 12.2. Sharing Programs |  | 13.2. Makefiles |
Copyright © 2001 O'Reilly & Associates. All rights reserved.
|
 |
|