Topics for student projects with the Centre for Software Reliability
This is a list of possible projects that are of interest to staff in CSR and generally linked to CSR's research. If interested, please contact the members of Centre staff indicated with each topic.
We are also interested in developing project ideas proposed by students in the areas of reliability or safety, and of software engineering methods for improving or assessing them.
Many of these projects pose challenging programming tasks. If you are instead interested in a project with predominant emphasis on non-programming problems, follow this link: non-programming, research-oriented projects. You may also wish to look at some titles of projects supervised in the past.
Possible supervisors in CSR are: Robin Bloomfield, Cristina Gacek, Kevin Jones, Bev Littlewood, Peter Popov, Andrey Povyakalo and Lorenzo Strigini (see profiles of CSR staff)
"Design and build" projects
A Multi-calendar consistency tool/app
Contact: Kevin Jones
We use a number of calendars every day on our desktop machines, pads, smartphones, etc. Many of them sync nicely with each other but some don't. In particular, I use two calendars at the University, one being exchange and the other being a text based wiki calendar. I need an app for my iPhone that allows me to update my phone calendar and automatically updates all other calendars including gcal and the wiki cal. For strong and ambitious students, I'd like voice recognition on the meeting input and the ability to ask the app to suggest the best meeting slot for a given list of people based on their availability in any of their calendars, even the free form text one on the wiki.
This is somewhere between a design and build and a client based project with me as the client since I want it to work and will use it if it does. It requires reasonable but not extreme programming skills and good understanding of how to design usable applications.
A Perfect Timetabling Program
Contact: Kevin Jones
Simply put, the University timetabling problem is reasonably well defined: We have students, lectures, lecturers, labs, rooms.
I would like this project to do a proper requirements analysis to work out what all the factors really are (Monday morning at 8am Is a bad time for a 3rd year lab
) and design build and test a program that produces the best possible solution to the problem, where best possible could mean fewest unhappy people
or best use of good rooms
or ... and does it with a decent interface so people would actually use the tool.
This would be a design and build project for a student with decent programming and HCI skills.
Hacked - a game/simulation project
Contact: Kevin Jones
This link shows a graphic illustration courtesy of NASA of why software reliability is important to ATC.
I want to build something similar for computer security. This could go a number of different ways depending on the level of ambition of the student.
I'd like to see a number of nodes in a cluster. Each node has some variable properties (strength of defences, detection ability, connections to trusted nodes, ...). There are many possible attackers on the network and they each have their own set of skills
(chance of finding a node, ability to avoid detection, ability to crack defences, ...).
What I want this project to do is to provide a way of playing out various scenarios and illustrating them in a nice visual way. There can be elements of a game, AI, HCI and real out of the box thinking. This project is best suited to some one with strong programming skills.
Diversity for security
Contact: Ilir Gashi, Vladimir Stankovic
All systems, including those built from off-the-shelf components, need to be sufficiently reliable and secure in delivering the service that is required of them. Several empirical studies have been completed recently at CSR which have evaluated and assessed the level of security that can be achieved with these systems. We have specifically researched the potential security benefits that can be achieved from combining more than one diverse product to deliver a service. For example we have looked at the potential security gains that can be gained from employing diverse antivirus engines.
There is plenty of scope for further work on this area, by implementing further data analysis and data collection methods. Projects in this area are likely to be hard and require critical thinking, but their successful completion can produce very useful results, which may be publishable, and hence lead to a very high mark
There is scope for both design-and-build
as well as research-oriented projects
. Possible topics include:
- Exploration of potential benefits of using diverse anti-virus engines (more than 2) to improve the security of a system;
- Analysing the National Vulnerability Database to investigate the potential benefits of diversity towards improving the security of a system;
- Exploring and researching the different architectural solutions and licensing models of AntiVirus products.
JDBC Driver for Div-SQL server
Contact: Peter Popov
Database replication is widely used to improve performance and availability of a variety of computer-based systems. Currently, both academic and industrial solutions are based on non-diverse replication, whereby distributed database servers from the same vendor are put together for scalability and availability improvement, e.g. Oracle's RAC product.
However, the use of database servers from the same vendor (e.g. Oracle) poses a dependability issue. Non-diverse database servers are inadequate protection against software faults - a 'bug' in one Oracle server is likely to manifest itself in all the servers in the cluster. A solution to this problem is to use diverse database servers for database replication, e.g. database servers from both Oracle and Microsoft can be deployed together in one solution, or even an open-source database server like MySQL can be used.
An innovative diverse database replication solution (referred to as Div-SQL) has been developed in the Centre for Software Reliability (CSR) to tackle the dependability issue. A test harness exists for performance measurements with TPC-C, a benchmark standard for on-line transaction processing (OLTP).
The project will involve development of a JDBC driver for Div-SQL, which will allow Java applications to be built for work with DivSQL.
The expectations that at the end of the project are:
- a server like middleware will be developed (reusing the code base of the existing middleware) which is deployed as a server (listening to a port);
- a client application will developed (reusing the existing code base of a TPC-C client) which will use the newly created JDBC to connect the middleware server (described above);
- a minimal set of experiments are conducted with the new test harness.
Resources to be made available are:
- Existing test harness (a multi-threaded Java application) which combines the functionality of the TPC-C client and a middleware handling current execution of multiple clients connected to Div-SQL;
- The source code of another JDBC driver.
Critical Infrastructure Modelling Tool using GMF
Contact: Peter Popov
Interdependency between critical infrastructures has been recognised as an important factor limiting the reliable supply of services (e.g. of energy or telecommunication services). An example of interdependencies is that the power transmission is controlled remotely using telecommunication channels, failure of which reduces controllability of energy transmission (e.g. ability to switch on or off some of the lines/breakers, etc.). Similarly, telecommunication devices require power, thus in case of a blackout, the telecommunication services cannot be guaranteed beyond a short period of time (dependent on the capacity of the batteries or the fuel available to run generators).
The Centre for Software Reliability has been involved in several projects in the last few years and developed own methodology for interdependency analysis and tool support, which consists of a front-end (Interdependency Designer) and run-time support. The Designer allows a user to construct a study (using a set of diagrams) and save the study data in several files. The run-time support would read the study files and via simulation would collect statistics useful for calculating the strength of dependence between the modelled infrastructures.
The expectation is that at the end of the project:
- A new Designer is created using Eclipse graphical Modelling Framework (GMF) which allows for:
- Project Support (creating/copying/removing a project and all its diagrams, adding/removing/modifying diagrams)
- At least 2 types of diagrams are fully supported in the tool out of those defined in the current version of the Designer.
- Exporting the model data in a format understood by the current run-time support;
- A trivial study is created and successfully deployed on the current run-time support.
Resources to be made available are:
- The existing tool for interdependency analysis (Designer, Run-time support, User's guide)
- Advice on building interdependency models and using the tool currently available
'Breaking Things'
Contact: Kevin Jones
Much of the emphasis in Computer Science/Engineering education is on building things. The other side of the coin, Breaking Things, also has an important role and equips the student to dive into the areas of testing and validation of systems.
There are a number of different projects in this area that could be undertaken depending on background and interest. In general, you would identify a system of interest, which could be:
- a moderate scale one that you build yourself;
- an example of a real world system that you know well or are interested in exploring;
- an actual system of interest to a client.
This system (software/hardware/socio-technical) should be studied in detail to examine the all the ways in which it might fail, with the aim of developing a comprehensive methodology for ensuring the correctness of the system, developing test tools if necessary.
This could be done as a 'Research Oriented Project', with the output being a complete description of the failure modes of a significant system and descriptions of strategies for testing for and mitigating these failures in practice (ranging from pragmatics of efficient testing though to formal approaches and even proof if desired). Alternatively, it could be a 'Design and Build' project where tools are designed and built to aid in the testing of a more modest system. Real world examples as 'Client-based Projects' would be particularly interesting.
Formal development of a program
Contact: Kevin Jones
Using techniques and tools from the 'Formal Methods' space, this project would involve the formal specification of a program/application of interest (student's choice) and using this specification as the basis for a formal development of the associated code. This could be done as a 'Research Oriented Project' where the result is a paper development with associated proofs, or as a 'Design and Build' project where the code is developed and run, allowing an analysis of the benefits/drawbacks of the formal approach.
The specification would be a significant part of a complex application, whereas development proofs would play a bigger part in simpler applications.
How reliable are popular cryptographic products?
Contact: Lorenzo Strigini
Cryptographic software is a cornerstone of security in the modern world, and cheap or free encryption/decryption software is widely available. Cryptographic algorithms are subject to public scrutiny and critique, improving trust in their effectiveness. But their implementation in software may introduce bugs that invalidate the properties of the algorithms. For a start, wrong encryption may produce undecipherable gibberish. More insidiously, it may produce output that is vulnerable to decryption. This project will assess reliability of one or more cryptographic packages via statistical testing (we teach this in module IN2016, and it is explained here). The intended benefits of this project are:
- assessing the reliability of some specific software;
- document a thorough method for testing such software statistically;
- improving awareness of the issue, by making available the method, or by publicising difficulties or disappointing results.
Time planning tools based on icalendar format
Contact: Lorenzo StriginiI manage my diary via an application that stores data in the popular format "icalendar". This application is OK for scheduling meetings and travel, but leaves an important set of needs unsatisfied. This project will produce software to satisfy a meaningful set of these needs, which are explained below. My time is divided between multiple projects, and I need to:
- plan in advance which days or hours I will spend on each one;
- have an automated facility to which I can specify constraints on my planning (e.g.: "I need 15 workdays for project X between the 10th of June and the 25th of November" and that will make sure my planning respects these multiple constraints;
- change this planning in response to new events (e.g. if I am ill for a period), but still automatically satisfying the constraints;
- record deviations from the plan after they happen;
- collect reports of the time spent.
Performance comparison of database replication protocols
Contact: Peter PopovDatabase replication is widely used to improve performance and availability of a variety of computer-based systems. Currently, both academic and industrial solutions are based on non-diverse replication, whereby distributed database servers from the same vendor are put together for scalability and availability improvement, e.g. Oracle's RAC product
However, the use of database servers from the same vendor (e.g. Oracle) poses a dependability issue. Non-diverse database servers are inadequate protection against software faults - a "bug" in one Oracle server is likely to manifest itself in all the servers in the cluster. A solution to this problem is to use diverse database servers for database replication, e.g. database servers from both Oracle and Microsoft can be deployed together in one solution, or even an open-source database server like MySQL can be used.
An innovative diverse database replication solution (referred to as Div-SQL) has been developed in the Centre for Software Reliability (CSR) to tackle the dependability issue. In addition, we would like to test the performance of the proposed replication protocol. Therefore, the project's cornerstone is the performance comparison of Div-SQL with the well-known academic solutions. The project will involve performance comparison of DivSQL with:
- Middle-R - a non-diverse database replication solution
- HRDB - a diverse database replication solution from the research group at MIT (USA), led by a Turing award winning scientist, prof Barbara Liskov.
Simulated random testing of software
Contact: Bev Littlewood
Some researchers have recently reported that random testing of software is surprisingly 'predictable'. That is, if you repeatedly test a piece of software, for the same amount of time (with different random seeds each time), the number of faults found will not vary much from one test to another. Why is this? In particular, does it say anything about the distribution of the 'sizes' of the faults? I would like to investigate by conducting some simulation experiments involving random testing with different fault-size distributions.
NOTE: There is scope for a "Design and Build" (building the simulation software yourself) OR "Research only" (non-programming) project (using statistics package such as R).
Analyzing program structure
Contact: Robin BloomfieldWe have been investigating the control and dataflow structure of COTS software and have some theories about the general topology of COTS software. A number of approaches are possible to extracting the control and dataflow, one would be to use a commercial tool such as CodeSurfer (free to academic research) and the other is to use the approach described below in the C call graph analyser project project from Peter Bishop. We want to analyse a range of open source software to derive the very large call graphs and store them in a suitable representation. The project has plenty of scope for extension and challenge either in scaling to analyse large programs or in the analysis of the resulting graphs.
Developing a graphical notation and prototype based on the Adelard ASCE environment
Contact: Robin BloomfieldA number of approaches have been proposed for supporting the application of standards and compliance tools for the safety and security of computer based systems. Recently the nuclear industry has developed an Excel based approach. This project would look at the requirements of users with this tool (by interviewing staff in Adelard a consultancy co-located with CSR) and develop a graphical notation and prototype based on the Adelard ASCE environment. The project would propose ways in which these two approaches could be compared with the user requirements. This would a development project with a research flavour.
Developing a prototype in the Adelard ASCE environment for analysing dependencies between critical infrastructures
Contact: Robin BloomfieldIn CSR we are developing an approach for analysing dependencies between critical infrastructures (see IRRIIS, call Preliminary Interdependency Analysis PIA). We would like to experiment with some prototype graphical interface tools to support the service oriented modelling. The project would capture requirements (from CSR) and develop a prototype in the Adelard ASCE environment. This would involve designing a graphical notation, defining textual support and possible interfaces to analysis tools. The ASCE environment facilitates all these. Again, a research project needing development skills and the ability to get to grips with emerging work on critical infrastructure modelling.
New features in tool for experiments on human reaction to computer error: calculator quiz (two projects)
Contact: Lorenzo StriginiThese projects will support ongoing research on the dependability of human-computer systems (cf the description of our DIRC project). An aspect of this problem is how computer error affects humans: do they recognise it and react properly? This is a critical issue in many areas. For example, look at the recent reserach on Mammography.
To run experiments in this area, we need a tool that simulates a session with some kind of computer application, allows us to make this application make mistakes according to patterns that we specify, and logs the users' reactions. Such a tool has been successfully developed by a previous project, and we now need to develop new features and various changes. You can preview the tool. The experimental task is to perform arithmetic calculations with the aid of an unreliable calculator.
The tool is web-based, using php, a database for storing settings and experiment results, and javascript.
- Project 1 will implement a selection of improvements from a "shopping list" developed in the previous project, including items like: devising a method, with a good user interface, for controlling the difficulty of the questions; dealing with the complexities of discriminating between correct and erroneous responses to quizzes involving real numbers; etc.
- Project 2 will build a game-like shell around the tool, to make it an attractive challenge for potential volunteer experimental subjects.
Both types of projects require a possibly limited amount of programming but with substantial challenges in terms of understanding stakeholder needs and (with the help of appropriate readings) translating them into suitable requirements and specifications for the software.
The reasoning behind the requirements will be an important part of the outputs, since it should be applicable to similar experiments concerning other experimental tasks .
Tool for experiments on human reaction to computer error in spell checking test
Contact: Lorenzo Strigini(for the background to this project, read the description of project topic "New features in tool for experiments on human reaction to computer error: calculator quiz").
This project will develop a tool for administering a psychology experiment in which the subject has to perform a spell-checking test. The requirement elicitation will take advantage of experience in previous projects on other experimental tools. there may also be some code reuse. A previous UG project has also developed a prototype tool for inserting errors in English text, which can also be of help for requirements and specification.
We envisage a web-based tool written in php, but we are open to alternative suggestions.
Experiments linked to the DOTS project
Contact: Lorenzo Strigini, Peter PopovOur research project DOTS concerns how to make "commercial off-the-shelf" software safe to use where money or people's safety are at stake. In relation to our research work, there is room for several projects to measure the reliability of commercial software and the improvement that can be achieved by our methods. Projects in this area have potential for producing very useful results and recognition in scientific publications but require a passion for rigorous thinking. Their design-and-build contents vary from requiring intensive operating system-level programming to no programming at all, using existing applications for all tasks.
They all begin with a feasibility study to select the classes of programs to be used in experiment.
Tools and implementation languages (if required): to be negotiated.
-
Support for a platform diversity experiment We wish to run an experiment in which the same program is compiled for two or more different platforms (e.g., Windows, MacOS, Linux/Unix, or two Windows machines with different hardware), tested extensively with similar inputs on all platforms and the results compared, logging discrepancies. Interesting problems include efficiency in distributed computation and saving data to survive crashes, since many crashes are expected during experiment.
NB a project on this topic has been done in 2000/2001 but a continuation project is appropriate to extend its results
Non-programming, research-oriented projects
(Feel free to propose your own topics for considerations. )NOTE: Please also check the projects titled "Simulated random testing of software", "How reliable are popular cryptographic products?", "'Breaking Things'" and "Formal development of a program" which are listed under "Design and Build" projects above. These projects may also be done as Research only (non-programming) projects.
Analysing C code: model checking
Contact: Robin BloomfieldRecent developments in formal methods in the form of model checking have led to a number of advanced prototype tools for the analysis of C programs such as SPIN (http://spinroot.com/spin/whatispin.html) and BLAST. This project would investigate the role of these tools in software development and undertake a trial of them on a range of programs that CSR have access to. An assessment would be made of how effective the tools are and how usable they might be: are they ready to move from the research community to use in real projects. Issues of maturity, scaling and prerequisite skills would need to be investigated. The project would require an interest and skill in mathematical aspects of program verification.
Examples of past projects, projects already taken
These projects can give you an idea of some topics of interest to CSR academics, in case you would be interested in proposing similar topics or extensions to work done in previous projects.
Tool for experiments on human reaction to computer error in pattern recognition test
Contact: Lorenzo Strigini(for the background to this project, read the description of project topic "New features in tool for experiments on human reaction to computer error: calculator quiz")
This project will develop parts of a tool for administering a psychology experiment in which the subject has to perform a pattern recognition test, for instance finding a face in a picture of a crowd, with the aid of imperfect advice. This task has been chosen for its similarity to the medical decision tasks we are studying (see examples here and here). The requirement elicitation will take advantage of experience in previous projects on other experimental tools. There may also be some reuse of requirements and specification parts.
The tool will artificially create complex images and challenge the subject to find specific patterns. We envisage an architecture with web-based front-end written in php, but we are open to alternative suggestions.
Components for open source software to automatically collect reliability records
Contact: Lorenzo StriginiThe aim of this project is to produce two important practical contributions to the culture of Open Source software development: better reliability and objective evidence of reliability. Its products will be a small library of components (possibly for just one OS product, ideally a general one that many can use) that automatically collect data about failures of the software (which several OS products already do), the duration of operation and the kinds of demands made on the software. The technical idea behind the software is outlined in a paper by J. Voas here.
Various challenges exist in specifying requirements - from analysing the data collection requirements of reliability assessment, to dealing with issues of user motivation and privacy, among others - and in programming. A successful project may well privilege one of the two aspects, aiming to produce a prototype that proves the feasibility of the concept from either the viewpoint of producing a feasible set of requirements or that of actually implementing the necessary functions.
Tool for converting data files to Google Earth Map files
Contact: Peter BishopDevelop generic tool for converting columns of data (e.g. in tab separated value format file) into a KML file that can be displayed on Google Earth. The tool should be able to specify:
- the introductory description and folder name for the KML file
- the fields that give the longitude latitude and altitude of each location, and any necessary unit conversion
- any grouping of points into sub-folders (based on a specified field)
- Additional data fields that are to be included in the map descriptor for each location+ their layout
Blog-searching tool for psychology research
Contact: Lorenzo StriginiThis project will build a tool that automatically trawls blogs on the web for information to be used in research about the psychology of emotions and judgement. The user and "client" for this tool is Prof Peter Ayton, in the department of Psychology, with whom we collaborate on various projects (cf DIRC). The idea comes from this website, but what is required here is a tool under the direct control of the investigator.
This project presents challenges from the technical point of view as well as from that of eliciting user requirements and delivering a usable, trustworthy tool. The student will be expected to interact and communicate with Prof Ayton and a psychology PhD student in the requirement elicitation, in order to understand how their research can be facilitated by developments in the design of the tool. Please copy your Email to me and Peter Ayton since the project requires a preliminary understanding with the client about such interaction.
Supporting compliance with complex regulations
Contact: Robin BloomfieldRegulations and standards for companies in the finance and safety critical sector are becoming increasingly complex. This project would look at the range of tool support that might be provided for the user of these standards and investigate how the use of XML in the regulations might improve the type of support that can be provided. The use of linguistic tools for analysis (see work on REVERE http://adelard.com/resources/papers/index.htm) and the role of graphical representations could be investigated (e.g. see the graphotron idea http://zvon.org/ZvonSW/ZvonGraphotron/index.html and the ASCE tool http://adelard.com/software/asce/)
A facility for managing duplication of files on a Macintosh
Contact: Lorenzo Strigini(Note: this is a Java programming project for a student who has a Macintosh computer and wishes to become proficient in programming for it) My disk is cluttered with duplicate files, e.g. documents that were sent to me by Email by multiple people, items that I downloaded from the web ... This is inevitable since I work in collaborations with many people, but it wastes space and complicates my archiving and indexing. A student project last year delivered a basic search facility. This project will build on this foundation: provide it with a helpful user interface, allowing the display of useful information about duplicate files and their differences, proposing menus of possible actions (delete, move, ..) based on the likely needs of the user depending on the specific pattern of duplication and differences observed and making sure to deal well with special file system features like aliases, links and symbolic links. Incremental delivery of features is essential. The student will need initially to study existing tools and their limitations, and then repeatedly elicit further requirements from me on the basis of each delivery.
Application of diversity to spam filters
Contact: Lorenzo StriginiA theory developed at CSR quantifies the benefits from using multiple, diverse systems for detecting security threats (see http://www.csr.city.ac.uk/people/lorenzo.strigini/ls.papers/2004_ESORICS_FTsecurity/). This project will collect the necessary data for verifying the practical applicability of the theory to the case of spam filters for Email. The student will need to study the principles of spam filters and organise a data collection exercise using two or more diverse spam filters, and collect statistics of the rates of "false positive" and "false negative" errors, for various kinds of messages and varying settings. The results will be : i) a clearly documented procedure for conducting such experiments; ii) an evaluation of the efficacy of the spam filters considered, with descriptions of their false negative and false positive rates, and iii) the data set as a basis for the mathematical evaluation by CSR researchers of the conditions for the practical application of the theory (formally, we need to assess whether a classification of message types can be found such that the failures of the spam filters for each class are statistically independent).
Simulating hardware faults and measuring performance of diagnosis algorithms
Supervisor: Lorenzo StriginiFor high-integrity computing it is important that hardware faults are automatically recognised. We have devised optimal diagnosis procedures (the tricky part is recognising transient faults from permanent faults), and we need to measure the dependability improvement it produces, compared against other procedure.
Requires study of simulation methods and tools, and of the scenarios used in previous published studies so that the new methods can be tested on the same scenarios. Special challenges in design and build part: efficiency; making the program recognise situations in which arithmetic with small numbers becomes too imprecise and would give spurious measurements.
Usability issues with mass-market software for scientists and engineers
Supervisor: Lorenzo StriginiAs software like Microsoft Office attempts to become more user-friendly for novice users, it develops defects that are especially frustrating for scientist users; unpredictability, incomprehensibility, non-persistence of the information produced, .... Study scientists' gripes about modern software, whether other users should have similar concerns, what remedies are already provided and whether they work, what developers could do to help this sector of their market.
Survey of social exclusion in Web sites
Supervisor: Lorenzo StriginiAdvanced features in Web site design may exclude those without expensive computers or communications: poorer citizens, libraries, schools, users abroad. This project will survey Web sites in one or more sectors like government, universities, utilities, gauge the extent of this problem and identify the more serious problems that website owners and designers should be alerted to.
Analysis of data about Open Source software development
Supervisor: Lorenzo Strigini
Open Source software development is the subject of heated debate - some
people claim that it has great advantages for most businesses, others
disagree; Microsoft obviously claims that it's dangerously
untrustworthy; yet, to quote just one example, the most popular web
server, Apache, is the result of an open-source project.
We at CSR are studying the reliability of open-source software
within a collaborative
research project.
A frequently voiced opinion is that software produced by open-source
projects is more reliable than software produced by "conventional"
means, or (a less frequently voiced one) that it improves at a faster
rate. But nobody has proved this via hard data.
This student project will examine the sequence of bug reports
available about an open-source product, and show what this implies about
the product's reliability and reliability growth.
There is both a component of clerical research work (which requires a
student to demonstrate good planning and self-management) and some
problems on which one needs to be inventive and hard-headed (like
finding reasonable estimates of how many people were using the software
at a given time, among the estimates produced in various ways by
different researchers).
There are various possible developments of this line of work.
One is to study the
trends in the data via a reliability prediction tool we have at CSR,
but in any case this is a challenging project on a hot current topic. If
done properly, it will be a valuable research contribution.
This project requires study of reliability concepts, characteristics of open source processes. For clarifications of the problem to be addressed, you may wish to look at the introductory parts of this paper.
Ensuring long-term survival of electronic data - methods for the small user
Supervisor: Lorenzo StriginiNote This project is formulated here as a mostly customer-driven project. If you prefer to give a stronger research bent, I'll be happy to discuss how.
I (like many other people) have on my computer or removable media thousands of useful files, created over about 20 years. However, time threatens their usefulness: the programs with which they were created become unavailable, vendors create incompatibilities between old and new versions, and the media themselves either decay or become unreadable for lack of appropriate hardware.
There is much literature about preserving electronic information, but busy people have no time for digging out the practical methods that have been devised and for applying them.
This project shall produce methods that are practically applicable by me, and in general by an individual or small organisation, to solve at least some aspects of this problem. Examples of such partial solutions would be guaranteeing effective re-usability of files produced with old versions of Word or Excel, or correcting for decay of files held on disk.
The project shall deliver a manual of what I need to do and shall test that I can actually do it, including buying or installing software, practices for handling my data on a day-to-day basis and if necessary a transition procedure for re-organising my files in a way that will facilitate future preservation. I expect that producing scripts for batch processing of files may be part of the work. The focus of the project must be on effective solutions, if necessary at the cost of only protecting against one of the many threats. So, effectiveness, safety, practical applicability (including proper documentation) and moderate cost of the procedures are essential. The project's intellectual challenges are in charting the threats and possible solutions and in tailoring solutions to the needs and constraints of people outside large organisations.
Preliminary steps will be to study: the various threats to survival of electronic files; my requirements as customer, with their relative priorities that will affect trade-offs between them; the solutions available from scholars, software vendors, service vendors or others. This background work is intended to be reusable by others to complete the practical part of the work
Measuring the performance impact of antivirus software
Supervisor: Lorenzo StriginiAntivirus software is bought to reduce risk, but it reduces performance to an extent that users are not told (see a reported case in which antivirus software reduced performance to less than a half). Yet risk and performance are just two different forms of costs for the user. To trade them effectively against each other, we need to quantify them. This project will produce quantitative knowledge of the performance cost of antivirus software, by both searching the existing literature and performing measurements on actual machines. The project shall produce useful input for decision about procuring and configuring antivirus software.
Requires study of performance measures and test tools.
Detection and recovery of contents corruption in archives of E-mail messages (mailboxes)
Supervisor: Lorenzo Strigini
Email programs allow users to organise received messsages in "mailboxes" for long-term archiving.
Some users thus store essential data in their E-mailboxes.
Yet, these data are vulnerable both to bugs of the mailer software and to disk problems.
Backups are useless if one cannot detect the data corruption.
This project will develop a stand-alone tool that checks the integrity of Eudora mailboxes (similar in format to Unix sendmail and netscape mailboxes) and tells the user what was lost and how it may be recovered.
Ideally, the tool will be made available to the huge world-wide community of potential users. An open-source arrangement could allow extension of tool to the needs pof different mailers.
Requires study of error-detection and error-correction algorithms and mailbox formats, implementation of algorithms in an adequate language and proper live testing.
Projects linked to the DOTS multi-database server
Contact: Lorenzo Strigini, Peter PopovThe DOTS project has built a "multi-database" server to demonstrate how database users can be protected against the bugs of the existing servers, and simplify switching vendors when necessary. The following projects are linked to this development work:
- Porting of the SoI publications database to run on the DOTS multi-database server The School of Informatics maintains a database of its research publications, accessible via the WWW. We wish to port this database to the DOTS multi-database server, both as a realistic test of the server and as a demonstrator of the capabilities of this technology developed by City University.
Multi-version and repetitive-test experiments: test case generator
Contact: Lorenzo Strigini, Peter PopovFor our research on software testing and fault tolerance, we have to run experiments in which multiple versions of a program are run against a similar series of tests, with different ways of controlling the generation of the programs and the sequences of tests. We have built various special-purpose tools for these experiments, and wish to extend them to become more flexible and adaptable. There is scope for several projects, to be specified jointly with the interested students, to cover parts of these needs. The next one in our plans is a tool to generate test cases for a program according to its intended statistical usage profile.
In our current experiment, the program under test (e.g., protection software for a nuclear reactor) receives a stream of input data from a "test data generator". This is essentially a simple simulator, producing data streams as could be produced by the controlled plant, with random variations. We wish to experiment with more complex data patterns, and, more importantly, to make the generator both efficient and flexible, so that it can be easily ordered to simulate various physical effects (noisy inputs, sensor failures) or even to produce input data for a wider class of embedded software. The first step of the project will be to port the existing implementation from Perl to C or another compiled language. The challenge for this project is in matching the flexibility of the tool to its requirements, careful specification of the interfaces between it and the rest of the experiment, considering requirements of extensibility and efficiency (which is very important since we may want to run billions of tests).
Platform: Linux/Unix. Implementation language and tools: C or other to be considered, to interface with other parts of the experiment built in Perl and C. Required: incremental development.
Experiments linked to the DOTS project
Contact: Lorenzo Strigini, Peter Popov
Projects linked to the DOTS multi-database server
The
DOTS project has built a
"multi-database" server to demonstrate how database users can be protected against the bugs of the existing servers, and simplify switching vendors when necessary.
The following projects are linked to this development work:
-
A translator among SQL dialects Existing off-the-shelf SQL database servers, despite being compatible in theory with the standard SQL language, in reality "speak" different "dialects", differing in the syntax of queries (and in the range of additional capabilities they offer besides the basic ones specified by the SQL standards). This project will produce an automated translator so that a program written to use - say - Oracle can also use another server, like Interbase or MS SQL. The project will require the use of automated tools for language manipulation like yacc or bison. The translator will be integrated with the DOTS "multi-database server".
Run-time acceptance tests to protect from program errors: experiments on multiple program versions and acceptance tests of varying strength In these experiment, multiple versions of a simple application program will be instrumented with reasonableness and correctness checks to detect erroneous results. By running them on large numbers of test cases, the experiment will measure the relationship between the stringency of the specified checks and their ability actually to detect errors.