NIRC Diagnostics
|
|
||||
Table of contents
NIRC CrashesQuick Summary:
Details: The new ("P3") NIRC software does not crash as often as the old software, so hopefully this page will be referenced relatively rarely. Still, there are ways in which the new software can crash. One diagnostic is activity on the rabbit tip window. Generally this window shows almost no output except during reboots and during DSP reboots. If you see a matrix of numbers, including "passthru," then you have likely suffered a crash. An example...
May 2 14:40:42 rabbit vmunix: unit=0 model=BCE,270-1722-02 Rev.1, file=../driver_share/hostport.c, line=249, unit_addr=fb05c000 May 2 14:40:42 rabbit vmunix: unit=0 csr=94400100 To recover from such a crash, it is usually enough to select Reset NIRC Software from the pull-down menu. This will automatically kill and restart various components of the software. It will then run through the same software that is run when the software is first started, so you will again be asked for your data directory. Some crashes are more insidious. For example, if communications with between rabbit and maili are lost, you may get the following on the rabbit tip window...
May 14 02:30:52 rabbit vmunix: hostIntrTimeout occurred May 14 02:30:55 rabbit vmunix: hostIntrRet is -1 May 14 02:30:55 rabbit vmunix: hostIntrRet is -1 May 14 02:30:55 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:55 rabbit rpc.collectd: passthru isr = e May 14 02:30:55 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit last message repeated 3 times May 14 02:30:56 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit rpc.collectd: passthru isr = e May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit last message repeated 2 times May 14 02:30:56 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit last message repeated 2 times May 14 02:30:56 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:56 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit rpc.collectd: passthru isr = e May 14 02:30:57 rabbit vmunix: hostIntrRet is -1 May 14 02:30:56 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:56 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:56 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:56 rabbit rpc.collectd: passthru isr = e May 14 02:30:56 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:57 rabbit vmunix: hostIntrRet is -1 May 14 02:30:57 rabbit vmunix: hostIntrRet is -1 May 14 02:30:57 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:57 rabbit vmunix: hostIntrRet is -1 May 14 02:30:57 rabbit last message repeated 3 times May 14 02:30:57 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:57 rabbit rpc.collectd: passthru isr = e May 14 02:30:57 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:57 rabbit vmunix: hostIntrRet is -1 May 14 02:30:58 rabbit last message repeated 3 times May 14 02:30:58 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:58 rabbit vmunix: hostIntrRet is -1 May 14 02:30:58 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:58 rabbit rpc.collectd: passthru isr = e May 14 02:30:58 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:58 rabbit vmunix: hostIntrRet is -1 May 14 02:30:58 rabbit last message repeated 3 times May 14 02:30:58 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:58 rabbit vmunix: hostIntrRet is -1 May 14 02:30:59 rabbit vmunix: hostIntrRet is -1 May 14 02:30:59 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:59 rabbit rpc.collectd: passthru isr = e May 14 02:30:59 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:59 rabbit vmunix: hostIntrRet is -1 May 14 02:30:59 rabbit last message repeated 3 times May 14 02:30:59 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:59 rabbit vmunix: hostIntrRet is -1 May 14 02:30:59 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:59 rabbit rpc.collectd: passthru isr = e May 14 02:30:59 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:59 rabbit vmunix: hostIntrRet is -1 May 14 02:30:59 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:59 rabbit vmunix: hostIntrRet is -1 May 14 02:30:59 rabbit last message repeated 3 times May 14 02:30:59 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:59 rabbit vmunix: hostIntrRet is -1 May 14 02:31:00 rabbit vmunix: hostIntrRet is -1 May 14 02:31:00 rabbit rpc.collectd: Starting debugInfo. May 14 02:31:00 rabbit rpc.collectd: passthru isr = e May 14 02:31:00 rabbit rpc.collectd: passthru cvr = 15 May 14 02:31:00 rabbit vmunix: hostIntrRet is -1 May 14 02:31:00 rabbit last message repeated 3 times May 14 02:31:00 rabbit rpc.collectd: passthru ATTACH failed May 14 02:31:02 rabbit nirc_klib: SysErr: Device busy ioctl(fd=3, DspSetStart) May 14 02:31:02 rabbit rpc.collectd: Starting debugInfo. May 14 02:31:02 rabbit rpc.collectd: passthru isr = e May 14 02:31:02 rabbit rpc.collectd: passthru cvr = 15 May 14 02:31:02 rabbit vmunix: hostIntrRet is -1 May 14 02:31:02 rabbit last message repeated 3 times May 14 02:31:02 rabbit rpc.collectd: passthru ATTACH failed Corresponding output in the "log tail" window is
May 14 02:30:55 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:55 rabbit rpc.collectd: passthru isr = e May 14 02:30:55 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:56 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:56 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:56 rabbit rpc.collectd: passthru isr = e May 14 02:30:56 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:56 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:56 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:56 rabbit rpc.collectd: passthru isr = e May 14 02:30:56 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:56 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:56 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:56 rabbit rpc.collectd: passthru isr = e May 14 02:30:56 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:57 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:57 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:57 rabbit rpc.collectd: passthru isr = e May 14 02:30:57 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:58 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:58 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:58 rabbit rpc.collectd: passthru isr = e May 14 02:30:58 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:58 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:59 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:59 rabbit rpc.collectd: passthru isr = e May 14 02:30:59 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:59 rabbit rpc.collectd: passthru ATTACH failed May 14 02:30:59 rabbit rpc.collectd: Starting debugInfo. May 14 02:30:59 rabbit rpc.collectd: passthru isr = e May 14 02:30:59 rabbit rpc.collectd: passthru cvr = 15 May 14 02:30:59 rabbit rpc.collectd: passthru ATTACH failed May 14 02:31:00 rabbit rpc.collectd: Starting debugInfo. May 14 02:31:00 rabbit rpc.collectd: passthru isr = e May 14 02:31:00 rabbit rpc.collectd: passthru cvr = 15 May 14 02:31:00 rabbit rpc.collectd: passthru ATTACH failed To recover, first try the Reset NIRC Software menu item. If this does not work, you may have to reboot rabbit, by using the option on the pulldown menu "Reboot Rabbit" or from a maili xterm reboot_rabbit. When the script completed, restart the NIRC observing software. Once it is up,type a carriage return in the rabbit tip window will give you a "login:" prompt. If all the scripts fail, the method to reboot rabbit from a maili xterm is as follows: > telnet k1consoles 2016 > ^] (control right bracket) > send break > boot (at the ok prompt) You may skip the first step if the "rabbit tip via telnet" xterm is still available for use. If it is availalbe use this xterm to complete the reboot sequence. Only one telent session may be connected at any given time. If the "tip" window fails to appear see if another maili tip window is still open by issuing the command: ps -axw | grep nircxterm | grep tip Rabbit Tip Via Telnet Session: no "rabbit login" promptSympton: Pressing return in the xterm labeled "Rabbit Tip Via Telnet" does not lead to a "rabbit login:" prompt.
Rabbit daemons not runningSympton: Rabbit daemons are not running. Typing ct on maili or rabbit does not yeild the following results.on maili... on rabbit... nirc 18017 0.0 0.2 132 124 ? S 12:56 0:03 rpc.xycomd nirc 18016 0.0 0.2 136 116 ? S 12:56 0:00 rpc.motord nirc 18015 0.0 0.4 304 264 ? S 12:56 0:04 rpc.collectd
Images fail to write to a scratch directory, or images take a long time to write. Image does not displaySymptom: Images take a long to complete and you may think that the failure mode is a rabbit crash. Images also do not display in figdisp. The image will complete on the order of one minute for a 1 sec exposure. Output in rabbit is the following:Mar 12 17:19:34 rabbit acquire_nirc: RFits_create: clnt_create failed with" : RPC: Remote system error - Connection timed out". Mar 12 17:19:34 rabbit acquire_nirc: Couldn't open nirc@maili:/sdata309/nirc/2008mar13. Mar 12 17:19:34 rabbit acquire_nirc: createFile failed, trying alternates. Mar 12 17:20:49 rabbit acquire_nirc: RFits_create: clnt_create failed with " : RPC: Remote system error - Connection timed out". Mar 12 17:20:49 rabbit acquire_nirc: Couldn't open nirc@maili:/scratch. Mar 12 17:20:55 rabbit nirc_klib: createFile succeeded with alternate(nirc@maili:/scratch2).
XSHOW does not update when obeserving parameters are updated.Symptom: XSHOW appears to be hung. Updated observing parameters are not displayed in the XSHOW window.
Rabbit not respondingSymptom: Data taking has halted. A ping check of rabbit from maili indicates that you can't connect to rabbit. The terminal server connection also exhibits behavior as if it can't talk to rabbit. TKloger pops up a warning that there is something wrong with the DSP. After trying to reboot rabbit from the pulldown menu, the p3 window looks like it is hung with the last output reading ...Starting tip session ...[1] 13837 Starting log tail session ...[4] 13844 Checking rabbit software ...[1] Done kill_caRepeater >& /dev/null Timed out waiting for rabbit software to respond. Will try to initialize...
Guider images are not normalSymptom: The Guider images do not look normal. However, the afternoon guider checkout indicated that the guider was functioning normal. Current images exhibit bad row readouts as seen in the two screen grabs below.
Problem and Solution: This may indicate that somewhere in the cable chain, a connector is not well seated. And moving in elevation is changing the pin contacts. Check the connectors for the guider. Remember that NIRC is removed from FWD Cass so that the instrument may be filled with LHe and LN2, so it is likely that the connectors at the FWD Cass Connector pannel need to be reseated. Below is a pic of the FWD Cass connector pannel which shows the photometric guide camera connector. Power down the guider and then try reseating connectors.
|
|