Systems that can detect and repair their own faults


Via [[http://www.cosmico.net/blog/2006/05/microkernels.html|cosmico.net]] I went over to read the recent [[http://www.cs.vu.nl/%7East/reliable-os/|second act of the Tanenbaum-Torvalds Debate]]. What follows are some interesting quotes I found inside (actually the post title is one of them also):

>Try rebuilding the entire operating system as described in the manual. The whole build, kernel + user-mode drivers and all the user-mode servers (125 compilations in all) takes about 5-10 seconds.

>Recently, my Ph.D. student Jorrit Herder, my colleague Herbert Bos, and I wrote a paper entitled Can We Make Operating Systems Reliable and Secure? and submitted it to IEEE Computer magazine, the flagship publication of the IEEE Computer Society. It was accepted and published in the May 2006 issue. In this paper we argue that for most computer users, reliability is more important than performance and discuss four current research projects striving to improve operating system reliability. Three of them use microkernels.

>The problem with distributed algorithms is lack of a common time reference along with possible lost messages and uncertainty as to whether a remote process is dead or merely slow. None of these issues apply to microkernel-based operating systems on a single machine.

>When two or more processes can access the same data structures, you have to be very, very careful not to hang yourself. It is exceedingly hard to get this right, even with semaphores, monitors, mutexes, and all that good stuff.

>Even the people working on Vista see they have a problem and are moving drivers into user space, precisely what I am advocating.

>Actually, MINIX 3 and my research generally is **NOT** about microkernels. It is about building highly reliable, self-healing, operating systems. I will consider the job finished when no manufacturer anywhere makes a PC with a reset button.

>The average user does not care about even more features or squeezing the last drop of performance out of the hardware, but cares a lot about having the computer work flawlessly 100% of the time and never crashing. Ask your grandma.

>This is our goal: systems that can detect and repair their own faults.