CGI (Common Gateway Interface) Programs
Colours and Backgrounds
Directories and Links
Restricting Access to Domains
Restricting Access to Groups
Restricting Access by Password
Restricting Access by Permissions
Server Side Includes
The following gives some general technical advice on creating web pages. The information is a little parochial to the author's Department but should be generally applicable. References to `web server' mean Apache; the features of other web servers are likely to be rather different. Some of the advice is rather subjective as it strays into esthetics.
You can execute programs as a result of web client enquiries - typically via forms. CGI programs are commonly written in Perl, though any executable language will do. (The program must, however, be executable on the web server. You might need to compile a binary for the web server, not for the computer you normally use.) Use a package like CGI.pm to support the creation and parsing of forms in Perl. CGI programs exist in their own directory like /home/me/www/cgi-bin/myscript. Since the server permits CGI scripts to be invoked only in its cgi-bin directory, a symbolic link is created to the real cgi-bin directory. The script can then be accessed as http://www.cs.stir.ac.uk/cgi-bin/me/myscript.
There are two major methods to give information to CGI programs: GET and POST. You should read up about HTML/CGI since the details are relatively intricate. (The CGI.pm description is quite informative.) I use CGI, for example, to support web page searching.
There are several link checkers to find broken links; my preference is checklinks. I have also written some web utilities that may be useful.
Clickable maps are either server-side (supported by CGI programs) or client-side (strictly HTML, and with definite advantages). There are utilities to create an HTML map that gives hot-spots in a clickable image. For example, my Division's floor-plan is a clickable map.
You can set the default colours for background, text and links by saying:
<body bgcolor="#f0f0f0" text="#000000" vlink="#005000" link="#0000ff" alink="#ff0000">
The colours are red-green-blue intensities as hexadecimal bytes (00 = lowest intensity, ff = highest intensity). It is a good idea to set all of the colours in case the client's default for text is, say, your choice of background colour.
It is also possible to use a background image:
A background can look fine on colour displays but appear horrible on dithered monochrome displays. (This page may be a good example of the problem!)
It is a good idea to have an index file in every directory since it is the one that is loaded when a URL cites just a directory name. (Strictly speaking, URLs for directories should end in slash. However, omitting the trailing slash is permitted.)
The server will follow symbolic links. For example, I have a common directory of icons that are linked to directories as required. My personal convention is to name the master file in directory mydir as mydir.html. Then I symbolically link it to index.html.
Use frames sparingly. A good use, for example, is to have a frame that provide a contents list for the main pages that are listed in a separate frame. Since some (older) browsers may not support frames you should provide an alternative. I use the following approach:
<html> <frameset ...> <frame src="contents.html" name="Contents"> <frame src="main.html" name="Main"> <noframes> <body> <h1>My Page</h1> <i>References to subsidiary pages</i> </body> </noframes> </html>
GIF and JPEG are universal, and both are compact. There is some controversy over the legality of GIF images due to patent infringement. I personally avoid PS, TIFF and XBM for web pages. PNG is increasingly popular. It's worth creating images in interlaced (progressive, every-nth-line) format. It's best to avoid more than, say, 10 kbytes of images per page so as to help clients on slow (telephone) lines.
Consider giving alternative text for clients that cannot/do not download images. It also helps web browsers if the size of an image is given:
<img src="mypic.gif" alt="My Picture" width=30 height=24>
Image rendering is subject to gamma correction for brightness. The effect is that a photographic image that looks OK on a Macintosh will look dark on a PC (or vice versa). (SGI, NeXT, ... have other gamma factors.) This is a nasty problem. It is possible to have Macintosh and PC versions of pictures but this is clumsy. An alternative is to use an image format that defines the gamma factor (PNG or TIFF in principle). Image libraries such as LIBTIFF or LIBPNG have functions to adjust gamma in images (e.g. Macintosh to PC optimisation). Your web browser may have an adjustment for gamma correction.
It is recommended that each web page defines its HTML version, language and indicates who made it. A list of keywords will also help search engines to find appropriate pages. Ensure that correct HTML is used with a suitable syntax checker. Although browsers are usually very tolerant of errors, it is bad practice to write sloppy HTML. (Unfortunately some applications fall into this category.) Sample boiler-plate might look like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>My Page</title> <link rev="made" href="mailto:email@example.com"> <meta name="keywords" content="subject1 subject2 subject3"> </head> <body> ... </body> </html>
Some applications that create web pages do not set the HTML page <title>. This is a nuisance when using a web search engine since the page may be cited as `No Title'. The page title should be set to something like the main page header (<h1>).
Search engines also commonly cite the first few body lines of a page. For this reason it is a good idea if pages start with text that would give a potential user some idea of whether the page would be useful.
Virtually any browser will have scroll bars and a `back to previous page' button. There is therefore little point in providing `to previous page' or `to top of page' links. However, as a result of a web search a user may end up in the middle of a group of pages. Unless you provide some link to other pages, the user will not be able to access them readily. My personal style is to provide a link to the parent of every page. A link to the author's home page will also allow the user to find related pages. It is also good practice to provide an email address on each page so the author can be readily contacted.
Some authors prefer to have a number of interlinked small pages. Others prefer to have single large pages. For the latter, it can be helpful to provide internal links to topics at the start of the page (as with this one).
Users may wish to print pages on a variety of paper sizes such as A4 or US Letter. It is therefore best not to require very large page widths (especially for elements like images and tables).
My favourite HTML hate is light text on a fancy dark background. This looks fine on the screen, but unless your browser prints the background by default the pages will come out white on white.
I am not very keen on pages that make excessive use of frames (e.g. a logo in a top frame and some fixed message in a bottom frame). This can result in a messy page layout. It would be better to include such material at the start and end of each page (say, with a Server Side Include).
I personally dislike page counters since counts are really relevant only to the web page owner not user. There are random `anti-counters' for those who dislike this kind of thing.
Another common problem is web pages that use an image to contain text (say, for a fancy font or for colouring). For this to be usable, the client is forced to download images (which are often thousands of times larger than the equivalent text).
The server looks for a world-readable file called .htaccess in each directory it traverses to get at a web page. The .htaccess file may include access restrictions. For example, the following restricts access to web clients in the cs.stir.ac.uk domain:
<Limit GET> order deny,allow deny from all allow from .cs.stir.ac.uk </Limit>
This is useful for local information that should not be globally accessible (e.g. from offsite). Note that only directories not individual files can be protected in this way.
Access can be restricted to a group of users. In this case the .htaccess file might look like:
AuthType Basic AuthName "MyPage Authorisation" AuthUserFile /home/me/mydir/.htgroup <Limit GET> require group somegroup </Limit>
A world-readable group file is named in this case. Its form is:
This lists the names that are obtained when the user is authenticated by username and password. Access can be restricted to specific users only if they are authenticated in this way.
Access can be restricted by username and password. In this case the .htaccess file might look like:
AuthType Basic AuthName "MyPage Authorisation" AuthUserFile /home/me/mydir/.htpasswd <Limit GET> require valid-user </Limit>
A world-readable password file is named in this case. Its form is:
The passwords are encrypted using the same algorithm as for normal login. Entries can be created using the `htpasswd user' command available on the Department's Unix machines.
At a minimum, all directories in the path to a web page must be executable by everyone (`rwx--x--x' in Unix terms). You can make your directories readable by everyone if you wish them to be able to find out what is in them. At a minimum, web pages must be readable by everyone (`rwxr--r--' in Unix terms). To deny access to a page but keep it accessible to you, remove read permission for `group' and`others'.
It is possible to include a file in a page (say, a standard header or footer):
You can also generate HTML on-the-fly with a program that is executed as the server returns a page. Refer to an executable program (binary, Perl, ...) in an HTML comment of the following form:
<!--#exec cmd="/home/me/mydir/myprog par1 par2"-->
As for CGI programs, the program must be executable on the web server. Note that the program can assume almost nothing about its environment (PATH, etc.) and should be called with an absolute path. A .htaccess control file may be needed in a directory on the path to allow this `Server Side Include'.
As an example, I have a program that generates a uniform footer for each of my web pages like this one. The same program updates web counts, prints file modification time and page URL. I also have a program that consults my diary and prints the current engagement on my home page.
Up one level to Ken Turner - General InformationLast Update: 18th April 2012