![]() |
Troubleshooting NIRSPEC Server Problems |
|
The recover script automates and speeds the diagnosis and recovery from NIRSPEC server crashes. This script replaces steps 2, 3, and 4 with minimal user input.
If you suspect the server has crashed, or a pop-up tells you the server has crashed, please do the following:
From a waimea prompt, issue the commmand recover. The script will check server health, perform various tests, and take remedial action if necessary. The script may prompt you and will report its progress.
After recover has successfully completed, restart the NIRSPEC Control Software via the pull-down menu. Please obtain any needed calibrations before initializing the filter wheels, rotator, grating, or echelle.
If recover fails for any reason, please call your support astronomer.
If a popup message warning of a server crash has appeared go directly to the next section.
You suspect the server has crashed.
Issue the command ct. If there are fewer than 3 server processes listed you now know it has crashed. Assume for the moment, the crash is relatively benign and go on to the next section.
Click here to see what output from the ct command should look like.
Click here if you saw 3 server processes but NIRSPEC is still unresponsive.
If you are recovering from a server crash, start by issuing the command:
kill_all
Once kill_all has finished running power cycle the matchbox:
iboot mboxOnce this script has finished running and power is restored to the matchbox, try to restart the server:
runserver
Here is what the output should look like during a successful start.
If you did not get a waimea prompt back and the server startup is hung up then issue a control c command.
If the server startup was successful, restart the control software using the background menu. Don't forget to enable night time mode after the full start up if you want the instrument and telescope to talk to each other.
If the server startup was unsuccessful and you did not get a waimea prompt back, you need to jump to section 4.
You are likely reading this section because you are having trouble starting the server or killing relic server processes. Follow these steps:
kill_all
Once kill_all has finished running power cycle all three boxes in the communications chain:
iboot allOnce this script has finished running and power is restored, try to restart the server:
runserver
Here is what the output should look like during a successful start.
If you did not get a waimea prompt back and the server startup is hung up then issue a control c command.
If the server startup was successful, restart the control software using the background menu. Don't forget to enable night time mode after the full start up if you want the instrument and telescope to talk to each other.
If the server startup was unsuccessful and you did not get a waimea prompt back, you need to jump to section 5.
It is assumed that you have worked your way through sections 2, 3, and 4 above and simple power cycles of comm's chain components has failed to get the server running again. You can try a reboot of the NIRSPEC host computer waimea and then (this is important) work through section 4 again.
If none of the steps described above have gotten the server running again here are some things to investigate:
Loose cabling. There have been a few instances where the server could not be started due to a loose power cable or SCSI connector. Have someone check the seating on all cabling going into and out of the black boxes and the match box. After confirming the seating of cabling on each component, power cycle it.
No Power To Instrument. The server may not start because instrument power has failed. One possible cause of power interuption is low glycol flow. When the flow drops below 1.1 gallons per minute, all electric power is turned off, to prevent overheating. There is a Hedland flow meter on the glycol unit that looks like a vertical bar graph. Have a summit tech or OA check the flow level shown on this meter.
Next have a summit technician or OA check the Pulizzi power strip up on the NIRSPEC electronics rack, located in the forward (toward the telescope) end on the right (away from the Cass platform) side. Caution: be sure to have the person inspecting the strip verify that all 8 of the individual circuits on the power strip are ON as shown in the picture. There is a weak ground loop somewhere in the power wiring on the module that will cause the "Master" green light at the right end of the strip to glow even when all power is off; an untrained or hasty observer may think this glow means that instrument power is on. Not true. They should check all 8 of the lights in the horizontal row near the center of the strip. Note that this picture is a little out of date. The double pull master power switch is now a single pull switch.
If NIRSPEC does not have power, the next step is to find out why. The likely culprits are low glycol flow, high glycol temperatures (chiller failure), failure of a fan within the NIRSPEC rack, or rack overheating from some other cause.
Bad fiber connections. If the check command indicates bad fiber comms, then this might be the place to start looking. First, double check the current fiber plug-to-jack matings both at the instrument and at the computer room interconnect panel, to be sure that the correct fibers are plugged in to the correct jacks. The posted procedures for the moves have the correct information.
Second, if the connections are right, have an Electronics Tech who's been trained to use the fiber tester check the fibers in use for losses. One of the fibers may have gotten dirty or been damaged. If a bad fiber is identified, swap in a spare (there are multiple spares in each bundle) and notify the Lead NIRSPEC I.S. as to which fiber is now in use, so that the procedures can be updated.
Something at the instrument is stuck. If all else has failed you can try a power cycle of the whole instrument. Here are some power procedure notes. Be aware that a power cycle of the whole instrument will also power cycle the on-instrument PXL electronics. This may mean you have to perform a number of additional recovery steps related to the guider such as reloading the driver and restarting Xguide.
There are several log files relating to the server that are kept in the overall log file directory on waimea, /sdatat600/logs:
Note that for the user "nirspec" on waimea, the alias cdlog will take you directly to this directory of log files.