distributed.net Faq-O-Matic: Isn't it bad to leave your computer running all the time?

	distributed.net Faq-O-Matic : How to participate : Isn't it bad to leave your computer running all the time?
	There are arguments for both sides, so you could decide either way. There is occasionally quite a bit of discussion about this topic on the mailing list, and the following is some of the observations that were made.
	The mean time between failures of hard drives is about 10 years at 24/7, and normally drives either die after a few months, or they keep going essentially for ever. On most hard drives, turning them off and on causes more damage than leaving them running 24/7. I know of many bits of equipment (10 year old fileservers) that were turned off for Y2K, and the increased load when turning the drives back on resulted in drive failure. Disk access frequency is not really a factor in HDD failure. Most of the drives that failed on my machines were backup drives which were used relatively infrequently. Most people now accept that the easiest way to achieve hardware reliability is to have a machine in a cool low humidity environment, to turn it on, and leave it permanently on. Start up loads are far more damaging than continuous use. In my opinion, the biggest factor on how long drives last is background vibration, humidity, and how well they are cooled.
	A computer that's on 24/7 is actually at slightly LESS (exactly HOW MUCH less is a subject still being debated among the experts) risk of failure since it isn't experiencing the jolt of power at startup time, and isn't having to start the drives spinning. Spinning up from dead stop is the most stressful time in a hard drive's life - depending on the specific drive, the power required to spin up from a cold start can be as much as 100 times (that's a worst-case situation - Usually it's closer to 3-5 times) what is required to keep it spinning while idle! The sudden draw on the power supply to handle that can cause failure of the system all by itself.
	Running a laptop 24/7 is just not a good idea, they're not designed to be run continously. Most don't have adequate cooling, and they're much more fragile than standard computers.
	Most semiconductor failures come from the slow dispersion of the "doping" (impurity) atoms through the crystal matrix of the substrate (silicon/germanium). The dispersion rate increases with temperature, and even with cooling fans, it's going to be faster when the machine is running than when it is not. I also have some good news: this effect is usually minimal, and chances are the machine will be hopelessly obsolete before it breaks even under 24/7 conditions.
	As someone else mentioned, the extra current knocks dopants from N type silicon into the P type silicon at the junction, leading to failure of a transistor. I seem to recall reading that this can happen in a couple years of use if you seriously overclock and keep the CPU busy, or in ten year or more if you don't overclock. (If you overclock, but only occasionally use it at full power, it'll live longer. Linux, and probably some other OSes, use the halt instruction in their idle loop to put the cpu into low power, wait-for-an-interrupt mode.)
	Running any client 24x7 doesn't contribute to failure any more quickly than an idling machine. Idling means only that the CPU is passing mostly zeros through its registers. The CPU is still pushing instructions, most of them simply do nothing.
	Has anyone bothered to look at the additional stress on the CPU that is not running the client? As an example, consider a server that usually sits idle but periodically gets a burst of work. The CPU on this server is going to be cool while idle between jobs then quickly heats up when a job starts and cools down again when the job is done. Differential temperature changes during these heating and cooling cycles are going to create thermal stresses on the chip. These stresses can cause minor flaws in the chip to expand until a critical circuit is broken and the CPU fails. By running the client the CPU is always busy so the thermal variations will be minimized.
	My work experience with semiconductor chip failures were most often at the bonding pad level which is stressed by thermal fatigue rather than constant high temperature. Maybe interconnects have improved over the years - there are many fewer of them these days. Any equipment that I want to keep running simply stays on all the time. The stresses of power on / off is where & when most of the failures of electronics fails for me.
	I would suspect that the first failures in a computer system would be observed in the mechanical systems such as the hard drive or in the power supplies. Integrated circuits which have survived the infant mortality period (usually 48 hours) should last 10-20 years, no matter what you do to them. They should not be adversely affected by the stress of thermal cycling, and dopants should not diffuse at typical junction temperatures on the order of 100 C. Two operating-life failure mechanisms of integrated circuits are hot electron injection and electromigration. Hot electrons are produced when a transistor switches states. They cause charge to be trapped in the gate oxide of the transistor, eventually (after many years) changing the behavior of the transistor and causing it to fail. Leaving your system on will hasten its death due to hot electron injection. However, as I stated above, I believe components other than the ICs are likely to fail first. The rate of hot electron injection is also proportional to the voltage and clock speed of the chip and so can be affected by overclocking. Electromigration occurs when the current density in the wire traces on the chip is too high. The flow of electrons can actually begin to move the metal in the wires until it causes an open circuit. This generally only occurs in a poorly designed circuit and should not be a concern. I have also seen operating-life failures due to random particulate defects on the chip. However, it is not the thermal cycling but the electric fields on the chip which cause these defects to kill a circuit. Most of this type of defect are weeded out during the infant mortality stage.

This document is: http://faq.distributed.net/?file=111

	[Search]	[Appearance]		[Show Expert Edit Commands]
This is a Faq-O-Matic 2.721.test.