Getting Started with CGI Programming in C

Content

This is an introduction to writing CGI programs in the C language. The reader is assumed to know the basics of C as well how to write simple forms in HTML and to be able to install CGI scripts on a Web server. The principles are illustrated with very simple examples.

Important warnings:

Why CGI programming?

As my document How to write HTML forms briefly explains, you need a server side-script in order to use HTML forms reliably. Typically, there are simple server-side scripts available for simple, common ways of processing form submissions, such as sending the data in text format by E-mail to a specified address.

However, for more advanced processing, such as collecting data into a file or database, or retrieving information and sending it back, or doing some calculations with the submitted data, you will probably need to write a server-side script of your own.

CGI is simply an interface between HTML forms and server-side scripts. It is not the only possibility—see the excellent tutorial How the web works: HTTP and CGI explained by Lars Marius Garshol for both an introduction to the concepts of CGI and notes on other pos­si­bil­i­ties.

If someone suggests using JavaScript as an alternative to CGI, ask him to read my JavaScript and HTML: possibilities and caveats. Briefly, JavaScript is inherently unreliable at least if not “backed up” with server-side scripting.

A basic example

The above-mentioned How the web works: HTTP and CGI explained is a great tutorial. The following introduction of mine is just another attempt to present the basics; please consult other sources if you get confused or need more information.

Let us consider the following simple HTML form:

<form action="http://www.example/cgi-bin/mult.cgi">
<div><label>Multiplicand 1: <input name="m" size="5"></label></div>
<div><label>Multiplicand 2: <input name="n" size="5"></label></div>
<div><input type="submit" value="Multiply!"></div>
</form>

It will look like the following on your current browser:

With an input of 4 and 9, you would get the followsing result.

Multiplication results

The product of 4 and 9 is 36.

Analysis of the example

We will now analyze how the example above works.

Assume that you type 4 into one input field and 9 into another and then invoke sub­mis­sion—typically, by clicking on a submit button. Your browser will send, by the HTTP protocol, a request to the server www.example (this is not a real server, just a name used as an example) The browser pick up the server name from the value of ACTION attribute where it occurs as the host name part of a URL. (Quite often, the ACTION attribute refers, often using a relative URL, to a script on the same server as the document resides on, but this is not necessary, as this example shows.)

When sending the request, the browser provides additional information, specifying a relative URL, in this case
/cgi-bin/mult.cgi?m=4&n=9
This was constructed from that part of the ACTION value that follows the host name, by appending a question mark “?” and the form data in a specifically encoded format.

The server to which the request was sent (in this case, www.example) will then process it according to its own rules. Typically, the server’s configuration defines how the relative URLs are mapped to file names and which directories/folders are interpreted as containing CGI scripts. As you may guess, the part cgi-bin/ in the URL causes such interpretation in this case. This means that instead of just picking up and sending back (to the browser that sent the request) an HTML document or some other file, the server invokes a script or a program specified in the URL (mult.cgi in this case) and passes some data to it (the data m=4&n=9 in this case).

It depends on the server how this really happens. In this particular case, the server actually runs the (executable) program in the file mult.cgi in the subdirectory cgi-bin of user jkorpela’s home directory. It could be something quite different, depending on server configuration.

So what is CGI programming?

The often-mystified abbreviation CGI, for Common Gateway Interface, refers just to a convention on how the invocation and parameter passing takes place in detail.

Invocation means different things in different cases. For a Perl script, the server would invoke a Perl interpreter and make it execute the script in an interpretive manner. For an executable program, which has typically been produced by a compiler and a loader from a source program in a language like C, it would just be started as a separate process.

Although the word script typically suggests that the code is interpreted, the term CGI script refers both to such scripts and to executable programs. See the answer to question Is it a script or a program? in CGI Programming FAQ by Nick Kew.

Using a C program as a CGI script

In order to set up a C program as a CGI script, it needs to be turned into a binary executable program. This is often problematic, since people largely work on Windows whereas servers often run some version of UNIX or Linux. The system where you develop your program and the server where it should be installed as a CGI script may have quite different architectures, so that the same executable does not run on both of them.

This may create an unsolvable problem. If you are not allowed to log on the server and you cannot use a binary-compatible system (or a cross-compiler) either, you are out of luck. Many servers, however, allow you log on and use the server in interactive mode, as a “shell user,” and contain a C compiler.

You need to compile and load your C program on the server (or, in principle, on a system with the same architecture, so that binaries produced for it are executable on the server too).

Normally, you would proceed as follows:

  1. Compile and test the C program in normal interactive use.
  2. Make any changes that might be needed for use as a CGI script. The program should read its input according to the intended form sub­mis­sion method. Using the default GET method, the input is to be read from the environment variable. QUERY_STRING. (The program may also read data from files—but these must then reside on the server.) It should generate output on the standard output stream (stdout) so that it starts with suitable HTTP headers. Often, the output is in HTML format.
  3. Compile and test again. In this testing phase, you might set the environment variable QUERY_STRING so that it contains the test data as it will be sent as form data. E.g., if you intend to use a form where a field named foo contains the input data, you can give the command
    setenv QUERY_STRING "foo=42" (when using the tcsh shell)
    or
    QUERY_STRING="foo=42" (when using the bash shell).
  4. Check that the compiled version is in a format that works on the server. This may require a recompilation. You may need to log on into the server computer (using Telnet, SSH, or some other terminal emulator) so that you can use a compiler there.
  5. Upload the compiled and loaded program, i.e. the executable binary program (and any data files needed) on the server.
  6. Set up a simple HTML document that contains a form for testing the script, etc.

You need to put the executable into a suitable directory and name it according to server-specific conventions. Even the compilation commands needed here might differ from what you are used to on your workstation. For example, if the server runs some flavor of Unix and has the Gnu C compiler available, you would typically use a compilation command like gcc -o mult.cgi mult.c and then move (mv) mult.cgi to a directory with a name like cgi-bin. Instead of gcc, you might need to use cc. You really need to check local instructions for such issues.

The filename extension .cgi has no fixed meaning in general. However, there can be server-dependent (and operating system dependent) rules for naming executable files. Typical extensions for executables are .cgi and .exe.

The Hello world test

As usual when starting work with some new programming technology, you should probably first make a trivial program work. This avoids fighting with many potential problems at a time and concentrating first on the issues specific to the environment, here CGI.

You could use the following program that just prints Hello world but preceded by HTTP headers as required by the CGI interface. Here the header specifies that the data is plain ASCII text.

#include <stdio.h>
int main(void) {
  printf("Content-Type: text/plain;charset=us-ascii\n\n");
  printf("Hello world\n\n");
  return 0;
}

After compiling, loading, and uploading, you should be able to test the script simply by entering the URL in the browser’s address bar. You could also make it the destination of a normal link in an HTML document.

Form for submitting data

(80 chars max.):

Form for checking submitted data

The content of the text file to which the submissions are stored will be displayed as plain text.

Even though the output is declared to be plain text, Internet Explorer may interpret it partly as containing HTML markup. Thus, if someone enters data that contains such markup, strange things would happen. The viewdata.c program takes this into account by writing the NUL character ('\0') after each occurrence of the greater-than character lt;, so that it will not be taken (even by IE) as starting a tag.

Further reading

You may now wish to read The CGI specification, which tells you all the basic details about CGI. The next step is probably to see what the CGI Programming FAQ contains. Beware that it is relatively old.

There is a lot of material, including introductions and tutorials, in the CGI Resource Index. Notice in particular the section Programs and Scripts: C and C++: Libraries and Classes, which contains libraries that can make it easier to process form data. It can be instructive to parse simple data format by using code of your own, as was done in the simple examples above, but in practical application a library routine might be better.

The C language was originally designed for an environment where only ASCII characters were used. Nowadays, it can be used—with caution—for processing 8-bit characters. There are various ways to overcome the limitation that in C implementations, a character is generally an 8-bit quantity. See especially the last section in my book Unicode Explained.


Last modified 2010-06-16 and 2017-10-02.