CGI tutorial

This tutorial was originally written for trip, a swiss diskmag.

CGI-TUTORIAL PART 1

by P. DooM / Carrots (kaufmann@pop.agri.ch)
  1. Introduction
     1.1 Things you have to know
     1.2 What you need
         1.2.1 Server
         1.2.2 Compiler

  2. Go!
     2.1 Hello World!
     2.2 Installing/running CGI's

  3. Output
     3.1 The first counter
     3.2 Files and paths
     3.3 Environment variables

  4. Input
     4.1 Encoded string
     4.2 Cgilib
     4.3 GET/POST

  5. One step down
     5.1 Request a file
     5.2 GET
     5.3 POST

1. Introduction

After reading this tutorial you should be able to write your own CGI-applications and understand how dynamic web-pages work. This text is not complete: There would be many other things and tricks to tell about CGI, however the information in is sufficient to code commercial applications like counters, guestbooks, chats or little web-games. If you like to go deeper in the topic of web-programming, I recommend you to buy a book about tcp/ip to learn how the internet works, a book about sockets to know how to use tcp/ip, a good html-book, and of course a book about CGI :)

1.1 Things you have to know

To understand the examples and concepts in this tutorial, you need some knowledge about HTML because I won't discuss all the basic HTML-tags here. If you have no idea how HTML works or what a tag is, you better read a doc first (I recommend 'selfhtml') and start here again after coding some html-pages. Of course it's helpful to know C to understand the examples. CGI can be written in ANY language, see 2., but I wrote the examples in C because it's the language I write my CGI's in. The bigger examples will be in more general pseudo-code anyway, so you should be able to get it even if you're a pascal-coder ;)

1.2 What you need

Well, the only two basic things you need to code CGI's are a server and a compiler.

1.2.1 Server

I expect that there are not too many people out there with their own web-server at home, so you have to look for other possibilities. When I decdided to start coding CGI's I looked for an internet-provider that would support it. Most providers that sell room for private homepages don't support CGI, just because it's to dangerous for them as they say. Some providers may offer you CGI, but it's possible that they don't let you upload your CGI's yourself but check/compile/upload them theirselfes instead and charge something like 100 sFr/hour for that! The provider I use now (agri.ch) said that they support CGI when I bought an account, but I had to wait half a year and write about 50 mails until they finally did support it! That was about a year ago now, and some days ago I received a mail saying that they weren't able to support my "special-configuration" (They had to put my homepage on a seperate web-server :) any more, so I'm CGI-less again at the moment. If you really need CGI and have a lot of $$$, ask any provider if they let you run CGI's if you buy a new domain. A nice and cheap way to get CGI is to ask a little computer-firm that has a permanent internet-connection to let you run your CGI's on one of their servers (Such firms need the permanent connection for their work and sell space for web-pages on their servers to kill the costs of the permanent connection, and perhaps they got already enough costumers to earn some money, so they sponsor your project and give you the CGI-account for free).

However, just ask your provider first if they support CGI, and if they say 'no' then just ask again, or ask why not, or send them a bottle of wine and ask again, and finally don't forget to tell me if you were succesfull :)

1.2.2 Compiler

Of course you need a compiler that fits your server. As most commercial servers run under some M$-thing, it may be useful to own a compiler like 'visual C' or something like that. Any compiler for any language that can generate 32-bit console applications will do I think. A CGI is in fact nothing more than a simple .exe-program that writes some texts, does some file-operations and reads from the standard-input.

I know, there is also UNIX out there. I concentrate more on the windows-stuff here, but the basics are the same on every operating systems. Perhaps your directories will have different names or you'll have to set the appropriate rights under UNIX.

2. Go!

Some general things first. CGI means: 'Common Gateway Interface', but the name doesn't explain what a CGI actually is, it only complicates the whole stuff. So, please forget the gateway-thing very quickly because the only thing you really have to know is: A CGI is a program that runs on a web-server and can be started from somewhere in the net. This sounds nice, but what do you need that for? Let's try to figure out how it started. Image you're having your own web-server right under your desk, running your private homepage. Now imagine you've coded the server-program yourself, so if someone out there wants to receive a bit of your page, the request is sent to your program and the program sends the content of the file back to the client (e.g. someone from italy needs the file 'http://your.homepa.ge/img/mydog.gif', the request will look something like 'GIMME IMG/MYDOG.GIF PLEASE!' and your response will be 'HERE IT IS, ITS A GIF-FILE:', followed by the gif-file). Thats not fun you think and you decide to modify your program such that if for example someone requests the file 'default.htm', you don't just send the content of the file, but you send a new HTML-file that has been generated by your program, whit a joke on the top of the page that has been selected randomly out of a jokes-library, the actual date and time written somewhere on the page, and every time someone requests the file, you increase a number in a seperate file to count how many times the file has been requested. So you've just invented CGI :)

Of course you don't have to write your own web-server or ugly things like that. It's much easier: If someone requests a file that is a CGI, the CGI is run by the server and the output of the program will be sent back to the client instead of sending the content of the file.

But how does the server know what files are CGI's? This is usually indicated by a special directory that contains only executable CGI's and no .html, .gif, etc...-files. This will then look something like this: '/scripts/hello.exe'

2.1 Hello World!

Enaff blabla for the moment, let's go right into an example. This example is in fact very simple, it does nothing more than some printf:

[ -- CODE 'hello.c' -- ]
#include <stdio.h>

void main()
{
  printf("Content-Type: text/html\n\n");
  printf("<html>\n");
  printf("<head>\n");
  printf("  <title>Hello World</title>\n");
  printf("</head>\n");
  printf("<body>\n");
  printf("  <h1>Hello World!</h1>\n");
  printf("  cgi rulez!\n");
  printf("</body>\n");
  printf("</html>\n");
}
[ -- CODE ENDS -- ]
As you can see this program produces this output:
[ -- TERMINAL -- ]
Content-Type: text/html

<html>
<head>
  <title>Hello World</title>
</head>
<body>
  <h1>Hello World!</h1>
  cgi rulez!
</body>
</html>
[ -- TERMINAL ENDS -- ]
this can be split-up in two parts: the header and the data. The header consist of the line 'Content-Type: text/html', the data is a simple HTML-file. As you can see the header and the data are separated by a single empty line. This is in fact what every CGI-programm looks like. We will discuss the header (it's a part of the HTTP-Header of the server response) later. Let's first figure out how to install and run the program as a CGI.

2.2 Installing/running CGI's

After compiling the program you have to upload the executable to the directory where you are allowed to run CGI's from. Let's take for example the directory '/scripts/arthur/'. To run the CGI you just have to request the file via HTTP, e.g. by typing

  http://www.provid.er/scripts/arthur/hello.exe

in your web-browser. If everything works fine you should see the output of the CGI in your browser. If the browser wants do download the .exe-file something went wrong and the file has not been recognised as a CGI (If you are using GetRight you may get some general problems with .exe-files, because it treats them as normal .exe files you can download! Try renaming the .exe- file to .cgi).

If everything works fine so far I congratulate you, and you better clean your desk and prepare for a huge phone-bill, because you will be coding CGI's for the rest of your life from now on :)

3. Output

As you have seen in the example above, the output from CGI ist just written to the standard output. If you run the program on your pc, the output will appear on the screen because this is the usual standard output. To redirect the output in a file, you may do something like this:

  hello.exe >output.htm

if the exe produces an HTML-file as output (hello.exe does), you can cut of the 'Content-Type'-header with a text-editor and open the file with a web-browser. This is a simple and useful way to test your CGI's if you aren't running your own CGI-server.

3.1 The first counter

hello.c is a real nice program, but it's just supid to do something like that in a CGI, because the output is always the same. Let's do something more interesting: a little counter. The program will do the following:
[ -- PSEUDOCODE -- ]
  - read the number of visitors from a file
  - increase the number of visitors by 1
  - write number of visitors back to file
  - print 'Content-Type'-header
  - print HTML (containing the count, some blabla-stuff, the date and time)
[ -- PSEUDOCODE ENDS -- ]
Ok. In C, this would look something like this:
[ -- CODE 'count.c' -- ]
#include <stdio.h>
#include <dos.h>

#define COUNTFILE "count.txt"

void WriteN(int n)
{
  FILE *file = fopen(COUNTFILE, "w+b");

  fprintf(file, "%i", n);

  fclose(file);
}

int GetN()
{
  int n;
  FILE *file = fopen(COUNTFILE, "rb");

  if(file == NULL)
  {                             //File doesn't exist
    return 0;
  }

  fscanf(file, "%i", &n);

  fclose(file);

  return n;
}

void main()
{
  int n_visitors;
  struct date d;
  struct  time t;

  n_visitors = GetN()+1;      	//Get number of visitors
  WriteN(n_visitors);		//Increment number of visitors

  getdate(&d);			//Get date & time
  gettime(&t);

  printf("Content-Type: text/html\n\n");

  printf("<html>\n");
  printf("<head>\n");
  printf("  <title>an example</title>\n");
  printf("</head>\n");
  printf("<body>\n");
  printf("  <h1>welcome to my homepage!</h1>\n");
  printf("  Hi! You are visitor number %i.<p>\n", n_visitors);
  printf("  The current year is: %d<br>\n", d.da_year);
  printf("  The current day is: %d<br>\n", d.da_day);
  printf("  The current month is: %d<br>\n", d.da_mon);
  printf("  The current time is: %2d:%02d:%02d.%02d<br>\n", t.ti_hour, t.ti_min,
							t.ti_sec, t.ti_hund);
  printf("</body>\n");
  printf("</html>\n");
}
[ -- CODE ENDS -- ]
The output may look like this:
[ -- TERMINAL -- ]
Content-Type: text/html

<html>
<head>
  <title>an example</title>
</head>
<body>
  <h1>welcome to my homepage!</h1>
  Hi! You are visitor number 5.<p>
  The current year is: 1998<br>
  The current day is: 18<br>
  The current month is: 8<br>
  The current time is: 18:42:42.39<br>
</body>
</html>
[ -- TERMINAL ENDS -- ]

3.2 Files and paths

That was also quite easy. However, this is already a bit more complex than hello.c, because it has a file-operation in it. You have to make sure that the CGI can write/read files! Perhaps the directory where it can do that is not the directory the CGI is in. If the CGI is in /scripts/arthur/, the data-directory may be /scripts/data/ or something like that. So please adjust the path in COUNTFILE and don't forget to use '\' in the path (instead of '/') if the server is running under win!

If you don't have a directory with writing permission, there is still a very bad and ugly way to write/read data. You could implement an FTP-client in your CGI's that connects to itself (the CGI-server) and does all the file-operations via FTP. Ugly, but it works :)

If you've however got such a directory, you should check if you can read from it via HTTP, e.g. with

  http://www.provid.er/scripts/data/thesecretdata.dat

from your web-browser. This has some advantages, but also some very big disadvantages. The advantages are that you can write a .html-file there and request it directly from the client, without running a CGI first. This is sometimes useful, for example in a guestbook. You'll have a file like /guestbook.htm in the data-directory that will be updated by a CGI and requested directly via HTTP. The disadvantages are that you shouldn't store secret data there. If you're planning to code a CGI-chat with user accounts, you shouldn't store the usernames and passwords there because everyone can download them :) The most convenient method would be to have two different directories, 'pubdata' and 'privdata'.

3.3 Environment variables

That was also very nive, but now we'll do something that is more fun and that shows the real power of CGI: The environment variables. This already belongs a bit to the chapter 'Input', because it's a method to get information from the computer that calls your CGI. It's like that: As you know from DOS, you can set environment variables there with the DOS-command 'set' (e.g. 'SET NAME=ARTHUR'). In C you can read the value of these variables with the 'getenv' command. A little example:

[ -- CODE 'envdemo.c' -- ]
#include <stdlib.h>
#include <stdio.h>

void main()
{                                     
  char *name;
  name = getenv("NAME");
  printf("%s\n", name);
}
[ -- CODE ENDS -- ]
The clue is: Before the server runs a CGI-program, it sets some environment- variables to nice values. The most interesing ones are:
GATEWAY_INTERFACE       Version of CGI protocol  (usually CGI/1.1)
SERVER_PROTOCOL         Version of HTTP protocol (usually HTTP/1.0)
REQUEST_METHOD          Is 'GET' or 'POST' (See chapter 'Input')
PATH_TRANSLATED         Path on the server
QUERY_STRING            Input-data if REQUEST_METHOD = GET (See 'Input')
CONTENT_LENGTH          Length of input for REQUEST_METHOD = POST (See 'Input')
SERVER_SOFTWARE         Name and version of server software
SERVER_NAME             Name of the server (DNS-name)
SERVER_ADMIN            E-mail address of system-administrator
SERVER_PORT             HTTP-port of server (usually 80)
SCRIPT_NAME             Path and name of CGI-program
REMOTE_HOST             Name of the client (or IP if no name)
REMOTE_ADDR             IP of the client
REMOTE_USER             Name of the user
REMOTE_GROUP            Group of the user
HTTP_ACCEPT             List of MIME-types the client accepts
HTTP_USER_AGENT         Name, version and OS of the client's browser
HTTP_REFERER            URL the client visited before it run the CGI
HTTP_ACCEPT_LANGUAGE    Supported language
HTTP_COOKIE             Cookie-values
As you see you can find out quite a lot about the guy who starts your CGI-program with his browser. But the environment-variables are also essential for the CGI-input. See next chapter for more info. Here's a little program that sends the values of all environment-vars back to the client. 'text/plain' is used here as content-type, so the respons is in plain text.
[ -- CODE 'getenv.c' -- ]
#include <stdio.h>
#include <dos.h>
#include <string.h>
#include <stdlib.h>

void Get(char *s)
{
  printf("'%s' = '%s'\n", s, getenv(s));
}

void main()
{
  printf("Content-Type: text/plain\n\n");

  Get("GATEWAY_INTERFACE");
  Get("SERVER_PROTOCOL");
  Get("REQUEST_METHOD");
  Get("PATH_INFO");
  Get("PATH_TRANSLATED");
  Get("QUERY_STRING");
  Get("CONTENT_TYPE");
  Get("CONTENT_LENGTH");
  Get("SERVER_SOFTWARE");
  Get("SERVER_NAME");
  Get("SERVER_ADMIN");
  Get("SERVER_PORT");
  Get("SCRIPT_NAME");
  Get("DOCUMENT_ROOT");
  Get("REMOTE_HOST");
  Get("REMOTE_ADDR");
  Get("REMOTE_USER");
  Get("REMOTE_GROUP");
  Get("AUTH_TYPE");
  Get("REMOTE_IDENT");
  Get("HTTP_ACCEPT");
  Get("HTTP_USER_AGENT");
  Get("HTTP_REFERER");
  Get("HTTP_ACCEPT_LANGUAGE");
  Get("HTTP_COOKIE");
}
[ -- CODE ENDS -- ]
... and here a possible output:
[ -- TERMINAL -- ]
Content-Type: text/plain

'GATEWAY_INTERFACE' = 'CGI/1.1'
'SERVER_PROTOCOL' = 'HTTP/1.0'
'REQUEST_METHOD' = 'GET'
'PATH_INFO' = '(null)'
'PATH_TRANSLATED' = 'C:\internet\InetPub\wwwroot'
'QUERY_STRING' = 'name=Arthur&var2=blabla&var3=hellou'
'CONTENT_TYPE' = '(null)'
'CONTENT_LENGTH' = '0'
'SERVER_SOFTWARE' = 'Microsoft-IIS/3.0'
'SERVER_NAME' = 'www2.agri.ch'
'SERVER_ADMIN' = '(null)'
'SERVER_PORT' = '80'
'SCRIPT_NAME' = '/scripts/arthur/getenv.exe'
'DOCUMENT_ROOT' = '(null)'
'REMOTE_HOST' = '194.6.166.201'
'REMOTE_ADDR' = '194.6.166.201'
'REMOTE_USER' = '(null)'
'REMOTE_GROUP' = '(null)'
'AUTH_TYPE' = '(null)'
'REMOTE_IDENT' = '(null)'
'HTTP_ACCEPT' = 'image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, application/vnd.ms-excel, application/msword, */*'
'HTTP_USER_AGENT' = 'Mozilla/2.0 (compatible; MSIE 3.0; SK; Windows 95)'
'HTTP_REFERER' = '(null)'
'HTTP_ACCEPT_LANGUAGE' = 'de'
'HTTP_COOKIE' = '(null)'
[ -- TERMINAL ENDS -- ]

4. Input

What I call CGI-input here is the data you send to a CGI with your browser. This is usually done with the HTML <form>-tag. I expect that you know the major options and sub-tags of <form> like <input ...> (have a look at a HTML-doc if that doesn't ring a bell).

4.1 Encoded string

Basically it's like that: all the <input ...>'s, it's names and values, are stored in one string, that is encoded like that:

  varname1=value1&varname2=value2&varname3=value3...

special-characters are translated into their ASCII-code in hex, so '(' becomes '%28' for example. Space (' ') can be translated into '%20' or '+', so the string

  'hello world(20)'

encodes into:

  'hello+world%2820%29'

4.2 Cgilib

This stuff looks perhaps a little bit weird now, so let's just make an example. The
in the HTML-file looks like this:

[ -- HTML -- ]
  <form action="http://www.provid.er/scripts/arthur/libtest.exe" method="get">
    Your Name: <input type="text" name="name" size=80 maxlength=100><br>
    Your age: <input type="text" name="age" size=80 maxlength=100><br>
    <input type="submit" value="Go!">
  </form>
[ -- HTML ENDS -- ]
If you enter 'Arthur Dent' as name and '42' as age, the browser will produce the following string:

  name=Arthur+Dent&age=42

The big question is: How/where/when/why does the CGI receive this string?? In the example above you may have noticed the 'method="get"' in the <form>-tag. If we use the get-method, the string will be stored in the environment-variable 'QUERY_STRING'! But how do we know that the get-method was used? Easy, just check 'REQUEST_METHOD', which contains either the string 'GET' or 'POST', depending on the method.

The only thing we need now is a parser for the QUERY_STRING. Fortunately this has already be done by some kind people. I'll use a modified extract from the library 'cgilib' by Eugene Eric Kim here (see cgilib.c, cgilib.h). The concept of it is to have a list of entries, a function to read the input from the QUERY_STRING into the list and some functions to get the value of a certain variable out of the list. This little example should explain everything:

[ -- CODE 'libtest.c' -- ]
#include <stdio.h>
#include "cgilib.h"

void main()
{
  llist list;
  char *name, *age;

  read_cgi_input(&list);


  name = cgi_val(list, "name");
  age  = cgi_val(list, "age");

  printf("Content-Type: text/html\n\n");
  printf("<html>\n");
  printf("<head>\n");
  printf("  <title>Input</title>\n");
  printf("</head>\n");
  printf("<body>\n");
  printf("  <h1>Input</h1>\n");
  printf("  You entered the following data:<br>\n");
  printf("  name:  '%s'<br>\n", name);
  printf("  age: %s<br>\n", age);
  printf("</body>\n");
  printf("</html>\n");
}
[ -- CODE ENDS -- ]
As you can see it's really very easy: All you have to do is call 'read_cgi_input' at the start of the program and read the values with 'cgi_val'. With this lib, you don't have to care about GET or POST (we will come to that soon) or how to parse the string. It even provides a function to test your CGI-programs: If REQUEST_METHOD is not set, you can enter the query-string directly, so you will enter something like 'name=Arthur+Dent&age=42' there. Try it out yourself...

4.3 GET/POST

As mentioned above, the input-data is put in the environment-variable QUERY_STRING if you use the get-method. The other method to send data to the CGI is POST. In case os POST, 'REQUEST_METHOD' will have the value 'POST', and the data won't be in the QUERY_STRING; it comes in through the standard- input instead! Huh? The standard-input is normally the keyboard. You can read from it with

  fread(buffer,sizeof(char),content_length,stdin);

'buffer' is a pointer to a 'char *'-buffer of length 'content_length'. 'content_length' is the length of the input-data. You can get this from the environment-variable 'CONTENT_LENGTH'.

So, why is there something like POST, why is GET not sufficient? Let's look at the way GET works. You've perhaps already seen an URL like this:

  http://www.provid.er/scripts/arthur/libtest.exe?name=Arthur+Dent&age=42

The stuff after the '?' is nothing else but the encoded input-string! So you can test your CGI's without making an HTML with a form in it that you have to fill out every time. Just add the string directly after the URL to the CGI, and put a '?' in between. That is one point for GET. But this has also a big disadvantage: Because the string is added to the URL, the length of the data is limited! (Don't ask me how much it is limited, perhaps 256 bytes, perhaps 1024, it's just not enough if you need a 10MB input :)

In everyday-use you should prefer POST. It has also the advantage that the data is not visible in the URL, so you better submit data like passwords or creditcard-numbers with POST :). As you have seen, it doesn't matter at the end what method you are using (for little input-data), because 'read_cgi_input' in cgilib.c does all the ugly parsing-stuff.

5. One step down

I hope you are not satisfied with the GET/POST-explanations above (if you are, you aren't a real coder :). Here we will have look at the HTTP-protocol and find out how GET/POST works on this low level. To test the stuff it might be useful to own a terminal-emulation program like 'terminal' that comes with every serious operating system. The port is always 80 (the HTTP-port).

5.1 Request a file

What happens when you enter 'http://www.yahoo.com' in your browser? First, it connects to www.yahoo.com, port 80. You can do the same with your terminal-program, just enter www.yahoo.com as server and port 80. If you can't get the terminal to work, enter 'telnet://www.yahoo.com:80' as URL in your browser.

After connection, enter the following:

[ -- TERMINAL -- ]
GET / HTTP/1.0

[ -- TERMINAL ENDS-- ]
(the string 'GET / HTTP/1.0', followed by twice ). Write everything in capital letters and don't forget the two enters! The server will send you the content of an HTML-file as response, and that is the data your browser displays!

5.2 GET

Let's try out something different. Connect to the server where your CGI- programs are located to port 80 and enter the following:
[ -- TERMINAL -- ]
GET /scripts/libtest.exe?name=arthur&age=42 HTTP/1.0

[ -- TERMINAL ENDS -- ]
(Assumed that libtest.exe is located in /scripts/). The response is the CGI-output, and you see here that the encoded input-data is just added to the path/filename of the requested file! The rest is done by the server itself.

5.3 POST

Let's request the same CGI again, but with the POST-method instead of GET this time. Enter:
[ -- TERMINAL -- ]
POST /scripts/libtest.exe HTTP/1.0
Content-Length: 19
name=P.+DooM&age=18

[ -- TERMINAL ENDS -- ]
... and that's the way POST works.

For more info about the HTTP-protocol and deeper insight in the way the whole www-stuff is handled, check out the appropriate RFC's.

[TO BE CONTINUED!]

back to P. DooM's Homepage


'98 by P. DooM