A Prototype for a Stateless File Server and its Cache Mechanisms

Kohei Honda (lecturer)
Queen Mary and Westfield College, London, U.K.
November 2000

1. Basic Description of Coursework

Background. This coursework is about the design of the system components required to support an application which reads and writes a file over the Internet.

Objectives. Your task is to make a pair of components: a file server (to be remotely situated) and a client-side library. Clients can use these two components to access a file located remotely, by means of TCP over the Internet. The objective in carrying it out is to obtain a deeper understanding of stateless file servers and their cache mechanisms. You are asked to produce two pieces of software:

(1) A simple stateless file server, which reads from files and writes to files in response to requests transmitted over TCP. It provides simple read, write and get-attribute operations. You do not need to provide other mechanisms such as security, locking, concurrency control and fault tolerance.

(2) A library class (API) for a client of (1), enabling it to interact with (1) over the Internet. This part is to be done in two stages: no caching is required in the first stage; in the second stage you should incorporate it. (You can also make a client itself, although a default client is provided.)

You are not required to implement a file server's most complicated bits, i.e. the direct interface with a disk device driver. Your server simply uses Java's standard API to open, read, write and close files. This is because the objective of this coursework is to create a stateless file server which allows clients to access files via TCP requests, and to learn about cache mechanisms on the client side.

Your system will be tested by using a specific application - a weather forecast database. A client requests data by sending a request over the Internet specifying a file name, the data is read by a remote server, which sends it as a reply to be displayed by the client. The default data and clients for checking the functionalities and measuring the performance are provided.

Stages. There are two stages in this coursework. The first stage is to design and implement a file server and a its client-side API without a cache mechanism. The second stage is to design and implement a cache mechanism which may use what NFS usually does (easier), or may be based on call-backs (more difficult).

2. Detailed Specification

2.1 Remote file server

File Path. The server has a ``default directory'' (which should be one of the subdirectories of your home). A user is supposed to initially specify the file name in the following form: IP-address:port/filepath'. The IP-address refers to the host where your server is running: for example it can be bronwyn (one can use a URL instead of an IP-address). port is the port at which your server is listening, which can be any 16 bit number greater than 1023. For security reasons, only allow the filepath to include a single file name without a directory path, i.e. check that the file name does not contain ``/''. This is important since, in this way, any possible disaster in your home directory is restricted to that specific directory.

Operations. Apart from the initial lookup operation, operations are done via a file handle (which is a long integer by default). The server will export only read, write and getattribute operations. Concretely the interface which the server will export to your client-side library should be:

res = read(fh, offset, n), where res contains (1) a sequence of bytes read (if the end of the file is encountered, up to there) and (2) the last modified time of the file.
res = write(fh, offset, data), where res contains (1) a boolean which tells the result of operation, and (2) the last modified time of the file. The default semantics is write-through.
res = lookup(IP:port/path), where res contains (1) the file handle and (2) the last modified time.
res = getattrib(fh), where res contains the last modified time of the file.

Note these are not Java methods: they are protocols exchanged between your server and your client-side library. So you can fill its details as you prefer as far as operations are stateless.

Statelessness. The server should be designed to be stateless. As already noted, it is not necessary to have multi-threading, concurrency control, etc. for this. As far as this requirement is satisfied, any reasonable design is permitted, even though the more you deviate from the above the more clearly you should specify the justification of your design in your document. The design of the server may not differ between the two stages, except that new operations may be added (such as those for callbacks, or for specifying the mode with which a file is being used).

2.2 Client-side API

2.2.1 Stage 1

Operations. The realisation of a client-side API which interacts with a stateless server is the main topic of this project. You should provide (a simplified version of) the standard stream-based operations which you are familiar with. While certain variations are allowed, it is required that you realise the following interface.

fh = open(IP:port/path), where fh is a file handle. This may or may not be the same as the handle used in the remote file system.
res = write(fh, data), where res contains a boolean which tells the result of the operation (success or failure).
res = read(fh, byte [] data), where the resulting data is read to the byte array data (if the length of the array data is n, then n bytes are read). If fh is at the end of the file, that part is read as null.
res = close(fh), where res contains a boolean which tells the result of the operation.

The above API is to be realised as Java methods. The definition of its interface is provided.

Semantics. The following gives the preferred semantics.

File-pointer. This simply moves on as you read/write data, starting from the beginning and finishing at the end of the file. At the time of open, you may specify the mode as reading or writing - this is not necessary in Stage 1.
Reading. This simply reads from the file.
Writing. Writing should be write-through.

Fundamental Requirement. Your software should run, with clear and readable documentation: in particular, any client which uses the interface above should be able to read/write a file remotely over the net.

2.2.2 Stage 2

Operations. These are the same as Stage 1, though you may add a read/write mode in the open operation for the control of caching.

Semantics. Here you should add a client-side cache mechanism to your original program. Thus, the method read on the client-side will read from the cache. The method write may still be write-through, but must also write on local cache.

For implementation of the cache mechanism, two possible ideas are:

What you can find in NFS, i.e. the one based on staleness (oldness) of the data and a timestamp.
The eager or aggressive caching of the whole file, together with a call-back mechanism (see the textbook for the idea).

These two are the recommended ways, though other forms may bepossible if they make sense.

Fundamental Requirement. In general, a cache mechanism consists of (A) storing data
locally and reading from it as far as possible, and (B) controlling the coherence (consistency). To get a moderate mark from this stage, you should at least complete (A) and give a design for (B) based on your implementation.

3. Deliverables

3.1 Stage 1: 10 marks

This stage should accompany runnable source codes.

Programs: source files compilable and runnable under Linux with a standard classpath.
Document (1-3 pages):

1. Basic documentation on the specs and design.

2. Measurement results of read-write operations for your server (using the default data provided).

3. Brief discussions on those functionalities which are not included in the given specification but which are necessary for a (local area and distant) distributed file service.

Stage 2: 10 marks

This stage can be submitted in three variations. (1) A concrete and clear design without any runnable code; (2) In addition, the implementation of part (A) in 2.2.2; (3) In addition, the implementation of part (B) in 2.2.2, i.e. a fully functional cache mechanism. (2) and (3) will be marked substantially highly than only (1). The deliverables consist of the following.

Program: source files compilable and runnable under Linux with a standard classpath.
Document (2-3 pages):

1. Basic documentation on the specification and design of the cache mechanism.

2. Measurement results for of read-write operations for the same task as Stage 1, this time using the cache.

3. A brief analysis of your design and the result of your measurements, including discussions on how your cache mechanisms (do not) contribute to the difference in performance.

Additional Report: 5 marks, the total restricted to 20 marks

Those who have completed Stage 1 and the design part of Stage 2 can also write an essay of 1-2 pages which discusses one of the following points:

(I) The pros and cons of call-back mechanisms in such an application and in a wide-area file service. In particular you would discuss the use of leases for callbacks, in the context of various applications such as (a) this application, (b) an application in which clients occasionally do both writes and reads, and (c) an application whose data is updated frequently at a server side. As another interesting point, leases can be used for measuring the usage of a service, hence for charging. What other ways of charging for (file) services can you think of? What are the pros and cons of different approaches including theuse of leases?

(II) Basic security measures can be taken for a wide-area file service. Discuss in particular (1) confidentiality and (2) authenticity: what problems will arise and what possible solutions can be considered.

Marking Methods

Two marking methods will be used.

(I) Evaluation of the submission. The submission (to be done electronically) should consist of: (1) sources compilable and runnable in Linux. (2) A Word file for the document, with your Name, StudentNo, what stages you covered, and the document itself. The structure of the document should follow the above description (you can use section/subsection headers etc. as appropriate).

(II) Oral tests. (five students in one group, each group 10 minutes). We discuss strong and weak points of each's design. This may or may not involve a demonstration.

Criteria for marking are:

Programs: whether they work or not; whether your design is sound/clear/soundly-innovative or not. I take documentation into consideration in evaluating design.
Document: basic understanding; sound ideas; sound and new ideas.
Oral test: depth of understanding.