NIRSPEC Reliability Improvement 
Progress report:  4  Feb  2004

Overview:

A month ago the project was asked to make significant progress on three tasks.    The tasks and their status are: We have added a new task to examine NIRSPEC power supplies.    Three tasks (upgrading host, correlation research, crash free periods) have moved further into the background.

Ia.  Speeding recovery from server crashes:

This task is now complete.   The fast/smart recovery script was released in time for the late January NIRSPEC run and seems to be working well.     Depending on the nature of the server crash, recovery should now be possible in 5 to 15 minutes.    Just as importantly, the script requires almost no decisions by the observer and intitial reactions have been very positive.   The failure to accomplish swift recovery from two crashes on the night of February 1, was unrelated to the script itself.   A link to a library file necessary for operation of the iBoots was incorrect but has now been fixed.

IIa.  Upgrade instrument host:

This task has moved further into the background.    Any work that does take place in the coming months will consist of  replacing explicit references to waimea in high level scripts with a variable host name followed by testing to confirm functionality.

IIb.  Correlation research:

This task continues as a pure background task.    Recent crashes have reinforced our suspicion that a significant fraction of server crashes occur during times of high rotator demand.     In addition, our interest is piqued by the correlation between power glitches and server crashes.

IIc.  Crash free periods:

Despite our continued search, no further NIRSPEC hardware or software changes have been found which coincide with the rapid increase in the rate of server crashes last spring.     Our interest in NIRSPEC's susceptibility to power glitches has led us to note however that the current Keck II instrument UPS was brought into service around the same time.

IId.  Characterize communications chain:

Delivery of the fiber attenuator and SCSI analyser near the end of December allowed work to begin during this report period.   Work was hampered somewhat by weather and unrelated fiber work, but despite this the task is now nearly complete.

IIe.  Examine power supplies:

This is a new task which we have recently planned and hope to start in the coming weeks.   Historically, NIRSPEC has been extremely sensitive to power glitches with such events almost always causing a server crash.    The possibility that one or more components in the communications chain, the transputers, or a motor is inadequately supplied or buffered will be investigated.

IIIa.  Reduce communications traffic:

Work on the new keyword server and rotator codes designed to reduce communications volume is well along and much progress was made during this report period.

Issues and Concerns: