HyperText Transfer Protocol Design Issues
See also: Why a new protocol? , HTTP as proposed, the HTTP protocol
as currently implemented .
Here are some design decisions to be made for protocols fro information
retrieval for hypertext.
Underlying protocol
There are various distinct possible bases for the protocol - we can
choose
- Something based on, and looking like, an Internet protocol. This has
the advantage of being well understood, of existing implementations
being all over the place. It also leaves open the possibility of a
universal FTP/HTTP or NNTP/HTTP server. This is the case for the current
HTTP.
- Something based on an RPC standard. This has the advantage of making
it easy to generate the code, that the parsing of the messages is
done automatically, and that the transfer of binary data is efficient.
It has the disadvantage that one needs the RPC code to be available
on all platforms. One would have to chose one (or more) styles of
RPC. Another disadvantage may be that existing RPC systems are not
efficient at transferring large quantities of text over a stream protocol
unless (like DD-OC-RPC) one has a let-out and can access the socket
directly.
- Something based on the OSI stack, as is Z39.50. This would have to
be run over TCP in the internet world.
Current HTTP uses the first alternative, to make it simple to program,
so that it will catch on: conversion to run over an OSI stack will
be simple as the structure of the messages is well defined.
Another choice is whether to make the protocol idempotent or not.
That is, does the server need to keep any state informat about the
client? (For example, the NFS protocol is idempotent, but the FTP
and NNTP protocols are not.) In the case of FTP the state information
consists of authorisation, which is not trvial to establish every
time but could be, and current directory and transfer mode which are
basically trivial. The propsed protocol IS idempotent.
This causes, in principle, a problem when trying to map a non-dempotent
system (such as library search systems which stored "result sets"
on behalf of the client) into the web. The problem is that to use
them in an idempotent way requires the re-evaluation of the intermediate
result sets at each query. This can be solved by the gateway intelligently
caching result sets for a reasonable time.
Request: Information transferred from client
Parameters below, however represented on the network, are given in
upper case, with parameter names in lower case. This set assumes a
model of format negociation in which in which the client says what
he can take, and the server decides what to give him. One imagines
that each function would return a status, as well as information specified
below.
When running over a byte stream protocol, SGML would be an encoding
possibility (as well as ASN/1 etc).
Here are some possible commands and parameters:
- GET document name
- Please transfer a named document back. Transfer
the results back in a standard format or one which I have said I can
accept. The reply includes the format. In practice, one may want to
transfer the document over the same link (a la NNTP) or a different
one (a la FTP). There are advantages in each technique. The use of
the same link is standard, with moving to a different link by negociation
(see PORT ).
- SEARCH keywords
- Please search the given index document for all items
with the given word combination, and transfer the results back as
marked up hypertext. This could elaborate to an SQL query. There are
many advantages in making the search criterion just a subset of the
document name space.
- SINCE datetime
- For a search, refer to documents only dated on or after
this date. Used typically for building a journal, or for incremental
update of indexes and maps of the web.
- BEFORE datetime
- For a search, refer to documents before this dat only.
- ACCEPT format penalty
- I can accept the given formats . The penalty
is a set of numbers giving an estimate of the data degradation and
elapsed time penalty which would be suffered at the CLIENT end by
data being received in this way. Gateways may add or modify these
fields.
- PORT
- See the RFC959 PORT command. We could change the default so
that if the port command is NOT specified, then data must be sent
back down the same link. In an idempotent world, this information
would be included in the GET command.
- HEAD doc
- Like GET, but get only header information. One would have
to decide whether the header should be in SGML or in protocol format
(e.g. RPC parameters or internet mail header format). The function
of this would be to allow overviews and simple indexes to be built
without having to retrieve the whole document. See the RFC977 HEAD
command. The process of generation of the header of a document from
the source (if that is how it is derived) is subject to the same possibilties
(caching, etc) as a format convertion from the source.
- USER id
- The user name for logging purposes, preferably a mail address.
Not for authentication unless no other authentication is given.
- AUTHORITY authentication
- A string to be passed across transparently.
The protocol is open to the authentication system used.
- HOST
- The calling host name - useful when the calling host is not properly
registered with a name server.
- Client Software
- For interest only, the application name and version
number of the client software. These values should be preserved by
gateways.
Response
Suppose the response is an SGML document, with the document type a
function of the status. ( Example )
- Status
- A status is required in machine-readable format. See the 3-figure
status codes of FTP for example. Bad status codes should be accompanied
by an explantory document, possible conianing links to futher information.
A possibility would be to make an error response a special SGML document
type. Some special status codes are mentioned below .
- Format
- The format selected by the server
- Document
- The document in that format
- Success
- Accompanied by format and document.
- Forward
- Accompanied by new address. The server indicates a new address
to be used by the client for finding the document. the document may
have moved, or the server may be a name server.
- Need Authorisation
- The authorisation is not sufficient. Accompanied
by the address prefix for which authorisation is required. The browser
should obtain authoisation, and use it every time a request is made
for a document name matching that prefix.
- Refused
- Access has been refused. Sending (more) authorization won't
help.
- Bad document name
- The document name did not refer to a valid document.
- Server failure
- Not the client's fault. Accompanied by a natural language
explanation.
- Not available now
- Temporary problem - trying at a later time might
help. This does not i,ply anything about the document name and authorisation
being valid. Accompaned by a natural language explaination.
- Search fail
- Accompanied by a HTML hit-list without any hits, but
possibly containing a natural explanation.
_________________________________________________________________
Tim BL