Summary

Software

Not connected to ICE server

Symptom
Attempts to change the OUTDIR keyword by running newdir command result in this error:
	ERROR - MDS Error (13): Not connected to ICE server.  Try connect.
Error setting outdir: ERROR - ERROR - MDS Error (13): Not connected to ICE server.  Try connect
Problem
The MOSFIRE detector server is not connected to the ICE server.
Solution
Follow these steps to re-connect to the ICE server. The trouble shooting efforts outlined here start small and build up to larger efforts.
    OPTION 1: Isn't it a lovely night for a stroll ...
  1. On the MOSFIRE desktop, display the Exposure Engineering GUI.
  2. Click the Connect button to re-establish the connection to the ICE server. Verify that the button labeled Connected: changes from OFF to ON.
  3. If the button labeled Ready: is not currently set to ON, then click the Resume button.
  4. From an xterm window on mosfireserver, execute the newdir command and verify that it now runs without error.
    OPTION 2: Start dusting off the cobwebs gunslinger, because this is starting to get ugly.

    If you try option 1 a handful of times and it does not work, try this next:

  1. power cycle the augmentix using the power control gui on the engineering menu.
  2. wait 3 min.
  3. try the connect and init functions a couple of times.
    OPTION 3: Steady there Gunslinger. Panicking will only get you into more trouble.

    If you try option 1 a handful of times and it does not work, try this next:

  1. power cycle the Jade2 using the power control gui on the engineering menu.
  2. wait 30 sec.
  3. try the connect and init functions a couple of times.
  4. Failing that try a powercycle of the augmentix one more time.
    OPTION 4: The zombies are walking your way Gunslinger, but I have high hopes for you.

    This will take about 15 min to shut everything down and to reboot the host machines. Of course you need to get SWOC involved because they need to complete the reboots. First resolved using the computer reboot method on 3 Aug 2015 (Marc, Dwight, and Julia).

  1. Shutdown the MOSFIRE operational software.
  2. Reboot kaimana (mosfire unix detector server) - SWOC does this.
  3. Reboot nuu (mosfire host) - SWOC does this.
  4. run ctx on nuu when it reboots and take appropriate action if needed (may need to restart all mosfire servers: mosfire restart servers)
    OPTION 5 You are running out of bullets gunslinger, and the zombies are closing in. LOOK OUT!

    Time to try a different data taking disk. Maybe the disk is at fault. Follow the instructions at: Augmentix disk swap procedure (software and hardware swaps) to swap which disk we use via software.

    If you think that you need to swap the physical disk hardware, consider this ... if you are not MOSFIRE's Yoda, you should get a MOSFIRE jedi on the line right now. And summit assistance may be required. First resolved using a disk swap software method on 4 Sept 2015 (Marc & Julia). Hardware swaps completed twice in the past (pre Sept 2015).

Global Server Crash

Symptom
GUIs stop updating. The MOSFIRE server and other servers status indicators located at the bottom of the the MDesktop may turn red, In addition, a server error is displayed in MDesktop and MSCGUI. An examination of the mosfire server logs reveals messages such as
	mosfire_trigger: Trigger loop already in progress. Exiting.
and
	!ERROR! could not write	msg for KEYWORD to send Q (Resource temporarily unavailable)
For a crash, the global server will be down.
Problem
A global server client is blocked or hanging, and the global server is attempting to broadcast a keyword update to it, but it becomes blocked in this attempt. This could indicate a stale client connection (connection persists; i.e., is still registered as a client in the global server even though client is dead). Starting and stopping clients rapidly appears to aggravate this situation.
Solution
A restart of the global server may be needed. Follow these steps:
  1. Determine whether the server is merely hung or actually crashed by typing
    	gpsserv mosfire_
    If the output lists 2 processes, the global server has not crashed but it may be hung; see recovery from blocked server. If the output lists only 1 server then this is a true crash and you should continue with this procedure.
  2. Terminate all non-desktop global server clients, including
    • CSU alerts
    • Xobslog
    Verify that all clients using waitfor have been terminated by executing
    	gps waitfor
    and check that nothing is listed (except fgrep).
  3. Restart the global server by doing either of the following:
    • From the command line: log in as user moseng and type
      	mosfire stop mosfire
      followed by
      	gpsserv mosfire_
      which should list no processes, then
      	mosfire start mosfire
    • From the background menu: select
      	MOSFIRE Control Menu -> Subcomponents... -> Restart_servers
      -> Restart Global Server
    • Check that the global server is broadcasting to clients by executing
      	cshow -s mosfire mmf1scycle mdscycle
      and confirm that the keywords increment. If so, then restart the waitfor clients and proceed with observing. If not, then kill all GUIs (desktop, MAGMA, OA eavesdrop, etc.) repeat this procedure.
    • Verify that the server indicators at the bottom of the MOSFIRE desktop turn green. The MOSFIRE desktop will automatically reconnect to the new server. If you suspect something is wrong on the desktop, restart the desktop by selecting
      	MOSFIRE Control Menu -> Subcomponents... -> Re-Start Desktop

Global Server Hung

Symptom
Similar to global server hang. GUIs unresponsive. The MOSFIRE server and other servers status indicators located at the bottom of the the MDesktop may turn red, In addition, a server error is displayed in MDesktop and MSCGUI.
Problem
The global server is hung.
Solution
A restart of the global server is likely necessary. Follow these steps:
  1. Determine whether the server is merely hung or actually crashed by typing
    	gpsserv mosfire_
    If the output lists only 1 server then this is a true crash and you should continue with the procedure above for recovery from global server crash. If the output lists 2 processes, the global server has not crashed but it may be hung; continue with this procedure.
  2. Check that the global server is broadcasting to clients by executing
    	cshow -s mosfire mmf1scycle mdscycle
    and confirm that the keywords increment. If so, then kill and restart all MOSFIRE GUIs (desktop, MAGMA, OA eavesdrop, etc.), thereby unblocking the server, and proceed with observing. If not, then then continue to next step. repeat this procedure.
  3. Terminate all non-desktop global server clients, including
    • CSU alerts
    • Xobslog
    Verify that all clients using waitfor have been terminated by executing
    	gps waitfor
    and check that nothing is listed (except fgrep).
  4. Restart the global server by doing either of the following:
    • From the command line: log in as user moseng and type
      	mosfire stop mosfire
      followed by
      	gpsserv mosfire_
      which should list no processes, then
      	mosfire start mosfire
    • From the background menu: select
      	MOSFIRE Control Menu -> Subcomponents... -> Restart_servers
      -> Restart Global Server
    • Check that the global server is broadcasting to clients by executing
      	cshow -s mosfire mmf1scycle mdscycle
      and confirm that the keywords increment. If so, then restart the waitfor clients and proceed with observing. If not, then kill all GUIs (desktop, MAGMA, OA eavesdrop, etc.) repeat this procedure.
    • Verify that the server indicators at the bottom of the MOSFIRE desktop turn green. The MOSFIRE desktop will automatically reconnect to the new server. If you suspect something is wrong on the desktop, restart the desktop by selecting
      	MOSFIRE Control Menu -> Subcomponents... -> Re-Start Desktop

Detector Server Failure

Symptom
Image fails to appear on disk or on image display. The MDS indicator on the MOSFIRE desktop is red. The testAll script indicates that the MDS is in a bad state.
Problem
The MDS (MOSFIRE datataking service) has crashed.
Solution
  1. Restart MDS server from the background menu by selecting:
    	MOSFIRE Control Menu -> Subcomponents -> Restart Servers -> Restart detector
    or by executing these commands on mosfireserver:
    	mosfire stop mds 
    	mosfire start mds
  2. Run testAll to verify that the status of MDS is now OK.
  3. On the Exposure Control GUI on the MOSFIRE desktop, click on CONNECT and verify that the corresponding indicator light turns green.
  4. On the Exposure Control GUI on the MOSFIRE desktop, click on RESUME and verify that the corresponding indicator light turns green.
  5. Run testAll to verify that the status of Checking datataking system is OK.
  6. Kill the observer ds9 tool by clicking File -> Exit.
  7. From the mosfireserver command line, execute the command
    	mosfire stop autodisplay
    to terminate the Python process mosfireMonitorAndDisplay.pyc.
  8. Restart autodisplay and ds9 from background menu via
    	MOSFIRE Control Menu -> Subcomponents... -> Re-start Image Display
  9. Execute the newdir command on the mosfireserver command line to reset the data directory.
  10. Acquire a test image and verify that it appears on the image display and is written to disk.

Global Server Blocked - MSG queue filling up.

Symptom
Guis become partly responsive, but it does not appear to be working correctly. A view of the log file using the pulldown menu option: MOSFIRE Engineering Menu -> Logfile Menu -> Log Tail -> Tail Mosfire (global server) logfile. Warning message indicates that the Message queue is increasing.
Problem
The global server communication is blocked. No further action may occure until the global server is restarted.
Solution
  1. Stop the software: MOSFIRE Control Menu -> Close MOSFIRE windows (GUIs)
  2. Run checkrpc in mosfire server window
  3. If tasks remain Kill remaining rpc tasks with either:
    • checkrpc -k
    • mosfireKillAllClients
  4. Restart the global server: MOSFIRE Engineering Menu -> MOSFIRE Trouble Recovery Menu -> Restart Global Server
  5. Restart the guis. Reset observer name and directory.

No Sounds from Speakers

Symptom
kEventSounds/soundplay utility does not echo event sounds.
Problem
The soundplay utility may not be running on your machine.
Solution
See the common VNC troubleshooting entry for a solution. trouble.html#vnc9 . You may have to click the Technical Index link to gain access. Login info required.

Detector

Image does not complete (Jade 2 failure)

Symptom
Image concludes prematurely, and exposure status in Exposure Status GUI indicates a Exposure Aborted (Read Timeout): Writing FITS file. If in middle of dither pattern, the dither pattern sequence will proceed and will skip the current position. If this happens, the status message will be overwritten in the status gui. ABORTED FITS header keyword in image is T, even though abort not initiated by user.
Problem
The Jade2 electronics failed to read all data from Sidecar. A timeout occurred waiting for new data to arrive.
Solution
Just continue operations as normal. We have not seen this issue persist, and subsequent exposures (triggered manually or in the next image in the dither pattern) proceed as normal.

Detector Server Failure

Symptom
Image fails to appear on disk or on image display. The MDS indicator on the MOSFIRE desktop is red. The testAll script indicates that the MDS is in a bad state.
Problem
The MDS (MOSFIRE datataking service) has crashed.
Solution
  1. Restart MDS server from the background menu by selecting:
    	MOSFIRE Control Menu -> Subcomponents -> Restart Servers -> Restart detector
    or by executing these commands on mosfireserver:
    	mosfire stop mds 
    	mosfire start mds
  2. Run testAll to verify that the status of MDS is now OK.
  3. On the Exposure Control GUI on the MOSFIRE desktop, click on CONNECT and verify that the corresponding indicator light turns green.
  4. On the Exposure Control GUI on the MOSFIRE desktop, click on RESUME and verify that the corresponding indicator light turns green.
  5. Run testAll to verify that the status of Checking datataking system is OK.
  6. Kill the observer ds9 tool by clicking File -> Exit.
  7. From the mosfireserver command line, execute the command
    	mosfire stop autodisplay
    to terminate the Python process mosfireMonitorAndDisplay.pyc.
  8. Restart autodisplay and ds9 from background menu via
    	MOSFIRE Control Menu -> Subcomponents... -> Re-start Image Display
  9. Execute the newdir command on the mosfireserver command line to reset the data directory.
  10. Acquire a test image and verify that it appears on the image display and is written to disk.

Lost connection to Sidecar Server

Symptom
MOSFIRE Dataset Status panel displays message ERROR: lost communication to sidecar server. Warning message ICE Timeout Exception in the Expsure gui and message queue. Images fail to write to disk.
Problem
The Sidecar server has died.
Solution
  1. Restart the Sidecar server as follows:
    1. Open an xterm on mosfireserver under any MOSFIRE account.
    2. Execute the command
      	vncviewer control1
      to connect with the VNC session on the augmentix computer. The password can be found in the SA password list.
    3. Kill the existing Sidecar server session by clicking the X at the top right of the COLD HxRG SidecarServer terminal window. See screenshot.
    4. Re-start the Sidecar server session by double-clicking the desktop shortcut labeled COLD HxRG SidecarServer. Verify that this launches a new terminal window and that messages start scrolling down the window.
  2. Restart the MDS:
    1. Open an xterm on mosfireserver under any MOSFIRE account.
    2. Execute the command
      	mosfire stop mds
      .
    3. Wait 5 seconds.
    4. Execute the command
      	mosfire start mds
    5. Execute the command
      	modify -s mds resume=1
  3. Reset Data Dir: run newdir in a mosfireserver xterm
  4. Take a test image and check the image size on disk. If the size is smaller than 16853760 bytes, then follow the procedure to recover missing image headers.

Lost connection to Sidecar Server #2

Symptom
MOSFIRE Dataset Status panel displays message ERROR: lost communication to sidecar server. Warning message ICE Timeout Exception in the Expsure gui and message queue. Images fail to write to disk.
and/or
Subsequent attempts to connect MDS to the sidecar fail almost immediately (mds connected=0 and you may be able to take images, strangely)
Problem
The time on control1 may not match mosfireserver
Solution
  1. Re-sync the time on control1:
    1. Open an xterm on mosfireserver under any MOSFIRE account.
    2. Execute the command
      	vncviewer control1
      to connect with the VNC session on the augmentix computer. The password can be found in the SA password list.
    3. Click on the time in the lower right corner to bring up the Windows time dialog
    4. Click on the "Internet Time" tab
    5. Click "Update Now" and wait about 10 seconds
    6. Look for a "Success" message. If the sync fails, repeat the click "Update Now" step until it succeeds
  2. Restart the Sidecar Server on control1
    1. Kill the existing Sidecar server session by clicking the X at the top right of the COLD HxRG SidecarServer terminal window. See screenshot.
    2. Re-start the Sidecar server session by double-clicking the desktop shortcut labeled COLD HxRG SidecarServer. Verify that this launches a new terminal window and that messages start scrolling down the window.
  3. Restart the MDS:
    1. Open an xterm on mosfireserver under any MOSFIRE account.
    2. Execute the command
      	mosfire stop mds
      .
    3. Wait 5 seconds.
    4. Execute the command
      	mosfire start mds
    5. Execute the command
      	modify -s mds resume=1
  4. Reset Data Dir: run newdir in a mosfireserver xterm
  5. Take a test image and check the image size on disk. If the size is smaller than 16853760 bytes, then follow the procedure to recover missing image headers.

Unable to start exposure (filter mismatch)

Symptom
Attempt to acquire an exposure in one of the "Dark" configurations fails; datataking system indicates filter mismatch.
Problem
The requested and actual filter positions differ.
Solution
Reset the demanded filter position to plain Dark by executing the recovery script from the background menu:
	MOSFIRE Engineering -> Trouble Recovery Menu -> Reset Dark Filter
or by issuing the command-line directive:
	modify -s mosfire filtertarg=Dark

Unable to start exposure (Jade 2 error)

Symptom
Unable to take images, attempts to resume MDS fail, one sees errors in SidecarServer output on control1.
Problem
The Jade2 electronics are in a bad state
Solution
Power cycle all hardware. Do not do this lightly as there is the possibility of inducing detector artifacts (not yet seen on MOSFIRE). This is most easily accomplished by:
  1. mosfireDisconnectWithDate
  2. wait a few seconds
  3. mosfireConnectWithDate

Unable to start exposure (mosfireWfMechs error)

Symptom
When attempting to take images by clicking Wait & Go button on MOSFIRE desktop, the data acquisition process exits within seconds with the following error:
	A mechanism error has occured. mosfireWfMechs exiting.
Problem
One or more mechanisms are not at their targeted positions.
Solution
  1. Execute mosfireWfMechs on the command line to receive helpful diagnostic information such as:
    	Error with PUPIL move: status=MISMATCH
    which will indicate which stage is out of position.
  2. Consult the Mechanism Status GUI on the MOSFIRE Desktop to confirm that all stages are at their targeted positions.
  3. If needed, re-send motor moves to send stages to desired positions. Note that the pupil rotator may report that the target position is open but that the actual position is some number such as -43751. This is acceptable.
  4. If you have determined that the conditions are acceptable for continuing with the exposure, click the selector beside Wait & Go and select Go. This will circumvent the step of running mosfireWfMechs before the exposure and allow you to acquire data despite the error condition.

Unable to start exposure (SCRIPTRUN error)

Symptom
The Wait & Go button on the MOSFIRE desktop Exposure Control widget is inactive (greyed out), preventing you from starting an exposure; however, exposures can still be acquired using the goi command.
Problem
The SCRIPTRUN keyword is set to a non-zero value, perhaps as the result of aborting a script.
Solution
Reset the SCRIPTRUN keyword using either of the following methods:
  1. Execute the following on the command line:
    	modify -s mosfire SCRIPTRUN=0
  2. OR, from the FVWM background menu, select:
    	MOSFIRE Engineering Menu -> MOSFIRE Trouble Recovery Menu -> Reset SCRIPTRUN keyword
This should immediately restore the Wait & Go button to “active” mode.

Data taking sequence aborted (timeout error)

Symptom
Data taking sequence is aborted becasue it is waiting for an exposure to complete.
Problem
The detector logging script in the Sidecar Server has not been started.
Solution
Start the detector logging script following this procedure (requires password).

Augmentix disk full

Symptom
Data taking fails
Problem
The C drive on the augmentix may be full. This is ideally prevented via IPM task #611.
Solution
Below are the same instructions for cleaning up the log files as the instructions appear on the IPM #611 task. You may need to reconnect to the sidecar server to re-initiate data acquisition.

Image headers missing

Symptom
SAT complains about images not having enough headers.
Image files in disk are smaller than the usual 16853760 bytes.
show -s mosfire csuextname results in none.
Problem
The header information is not written in the FITS headers following a restart of the detector server.
Solution
In MAGMA:
  • Setup an mask different to the current one.
  • Setup the current mask.
  • Execute the current mask.
Take an image and check that the size on disk is 16853760 bytes.

Unable to start an exposure (no error shown on the Exposure Status window)

Symptom
After hitting Wait & Go nothing happens. The exposure progress bar is no updated and the Wait & Go button remains innactive. No error message appears on the Exposure Status window.
Problem
The detector server does not respond.
Solution
Power cycle the detector control system. On the background menu select Mosfire Engineering Menu --> Engineering Gui Menu --> Power control GUI Under Cabinet A.
  • Power OFF the Computer
  • Power OFF Jade2.
  • Power ON Jade2
  • Wait for 30 seconds.
  • Power ON Computer.
  • Wait for minutes.
Select the camera icon on the top left side of the MOSIFRE desktop menu. In the popup window (Exposure System) select:
  • Connect
  • Init

Mechanism Moves

Grating move fails

Symptom
Grating mechanism fails to reach intended target. Position of grating mechanism in status GUI shows "Unknown". Switch value (show -s mmgts switch) does not match target switch value (most easily obtained using mosfireShowMechPositions mmtgs).
Problem
The mechanism is moving past intended target position and hitting the limit switch in the negative (grating) direction. Under normal execution, the mechanism is supposed to stop when it hits the stop switch, and when this happens, only it and the position switch should be activated.
Solution
Re-execute the move by using Observing Mode GUI or global server keyword modify -s mosfire setobsmode=filter-grating.

Optional: we may be able to recover from this by backing the mech away from the limit switch (modify -s mmgts step=10).

If the above fails try lower level moves:

  • First move to safe location by executing modify -s mmgts targname="safe grating"
  • Move back to grating position by executing modify -s mmgts targname=HK or mosfire -s mmgts targname=YJ.

Pupil fails to transition

Symptom
After switching filters that requires the pupil to transition from open to tracking or tracking to open, the mechanism appears hung. The move never finishes and the status indicates that it is still moving, but possibly the target position says that it is tracking or open.
Problem
Unknown at this time.
Solution
There are two possible solutions:

    Option A:
  1. Click the Home button on the mdesktop.
  2. Click on Home Pupil Rotator
  3. On the Observing Mode GUI, select H imaging and wait for the pupil to be fully open. The Mechanism Status GUI should show Open in the Pupil Status and Target fields.
  4. >On the Observing Mode GUI, select Ks imaging and wait for the pupil to be tracking. The Mechanism Status GUI should show Tracking in the Pupil Target field and a number in the Status field.

    Obtion B:
  1. In a mosfire server window, execute these commands in a mosfireserver window:
    	modify -s mmprs reset=1 
    	modify -s mmprs home=1
    and reselect the filter mode.
  2. If the above does not work, , execute these commands in a mosfireserver window:
    	modify -s mmprs zero=1 
    	modify -s mmprs reset=1 
    	modify -s mmprs home=1
    and reselect the filter mode.

Note that the zero and reset actions should take a couple of seconds to complete, but the home sequence could take a minute to recover.

Mechanism fails to move following initial power up.

Symptom
Mechanisms will not move and complain of errors.In particular the grating appears to be in an unknown state and will not home.
Problem
Sometimes when the motor crate is powered on following installation or removal from the telescope, the motor crate comes up in an odd state.
Solution
  1. showpower on mosfireserver. Motor Box power should be ON
  2. modify -s mosfire pwstatb5=0 turns off the motor power. you can verify with a showpower command
  3. modify -s mosfire pwstatb5=1 turns on the power
  4. Attempt to home or move the mechanism.

Mechanism in ERROR state; unable to move.

Symptom
Attempts to move a stage (including grating, filter, pupil rotator, or hatch/dust cover) fail immediately with an error similar to this:
    MRMS Error (46): Error moving motor.  Status is error. Check comms.
    Error setting home: ERROR - MRMS Error (46): Error moving motor.  Status is error. Check comms.
The STATUS keyword for the stage is Error instead of OK.
Problem
Something is wrong with either the stage, its controller, or its server.
Solution
  1. Attempt to initialize the controller for the stage by setting the corresponding INIT keyword to 1; e.g.,:
    	modify -s mmgts init=1
    or click the appropriate button on the Home widget available on the MOSFIRE desktop.
  2. If the state remains Error, then attempt to reset the controller for the stage by setting the corresponding RESET keyword to 1; e.g.,:
    	modify -s mmgts reset=1
    and try homing the stage again.
  3. If homing fails, then stop and re-start the individual server for the stage by selecting
    	MOSFIRE Control Menu -> Subcomponents... -> Restart servers...
    or manually from the command line via:
    	mosfire stop mmgts
    	mosfire start mmgts
    then re-try homing the stage.
  4. If the state remains Error, try power cycling the hardware. Run the showpower command to ensure that power is on and determine the name of the corresponding power keyword. In this case, pwstatB5 is the power keyword for the motor control box.
    	modify -s mosfire pwstatB5=0
    then wait 10 sec and issue this command:
    	modify -s mosfire pwstatB5=1
    Then re-try homing the stage.
  5. If the state remains Error, try executing the disconnect/connect scripts from the background menu by selecting:
    	MOSFIRE Engineering Menu -> mosfire disconnect
    followed by
    	MOSFIRE Engineering Menu -> mosfire connect
  6. If this fails, then run the disconnect script again via
    	MOSFIRE Engineering Menu -> mosfire disconnect
    then have summit staff verify/re-seat all MOSFIRE connections, then reconnect via:
    	MOSFIRE Engineering Menu -> mosfire connect
  7. If this fails, then try replacing the controller. An instrument support tech on the summit will be required for this task.

FCS turns off or goes inactive

Symptom
FCS button on the control desktop turns red or off.
Problem
FCS has failed or the server has lost communication with FCS.
Solution
  1. Proceed in this order until FCS turns on.
  2. Attempt to turn FCS on via keyword:
           modify -s mosfire fcson=1
           modify -s mfcs enable=1
  3. Attempt to enable FCS from the FCS engineering GUI available from the engineering pulldown menu.
  4. Power cycle FCS using the power control GUI available from the engineering pulldown menu.
  5. Stop and restart the global server and fcs subserver.
           mosfire stop mosfire
           mosfire stop mfcs
           mosfire start mfcs
           mosfire start mosfire
           check the FCS engineering GUI and toggle the power and enable buttons.
  6. Powercycle (the hammer)
           mosfireDisconnectWithDate
           mosfireConnectWithDate
           restart the Augmentix server

Filter wheel ends a move in an unknown position

Symptom
When selecting an intermediate band filter (J2, J3, H1, H2), the filter wheel in the Mechanism Status of the MOSFIRE Desktop shows a big red question mark symbol.
Problem
The filter is in position but one of the three switches associated to its position was not activated. This type of error might appear for observers using the observing mode J2-spectroscopy.
Solution
  1. Ask you support astronomer to log into a terminal as user moseng.
  2. In the moseng terminal, type in mmf1s. The output will look similar to this:
           =======================================
           Filter   Pos  Location  Switch  Binary 
           =======================================
              open     0         0       2     010
                J2     1      3472       3     011
                J3     2      6944       4     100
                H1     3     10416       5     101
                H2     4     13888       1     001
            NB1061     5     17360       6     110
           =======================================
                            Current               
           =======================================
           unknown     1    -67589       7     111
           =======================================
    Note the negative value in the Location column at the bottom of the previous table.
  3. Force the Switch value for the unknown Pos to be 7.
              modify -s mm1fs switch1=7
    This should elliminate the question mark from the filter in the Mechanism Status GUI.
  4. Once the J2-Spectroscopy observation has been finished:
    • Return the filter position Switch to its original value:
                modify -s mmf1s switch1=3
      Note this will only work for the J2 filter. If the faulty filter switch was on a different position, say H2, then then the command should be modify -s mmf1s switch4=1.
    • Home the filter wheel:
      • modify -s mmf1s home=0
      • Click on the house symbol to the upper-left of the MOSFIRE desktop. On the pop-up window, select Home Filter Wheels. Wait for the homing process to complete. If the homming was successful, the filter wheel GUI should be crossed out with a big red "X".
    • Set the new filter: Select any new observing mode, e.g. Dark imaging J.
    If the filter wheel move ends in the unknown position, repeat the last two steps.

CSU-cryogenic slitmask unit

CSU System not Ready

Symptom
The CSU status window indicates that the CSU is not "ready". The result of executing the command
	show -s mosfire csuready
is zero.
Problem
The CSU needs to be reset.
Solution
From the background menu, select:
	MOSFIRE Engineering Menu -> Trouble Recovery Menu -> Power Cycle CSU

CSU Setup Failed

Symptom
When attempting to execute a CSU move, the move fails immediately. CSU status indicates the following error:
	Setup failed: Error sending move command
The CSUREADY keyword is set to a value of -1.
Problem
The CSU needs to be reset.
Solution
From the background menu, select:
	MOSFIRE Engineering Menu -> Trouble Recovery Menu -> Power Cycle CSU

Bars are not at desired location

Symptom
The bars did not reach the intended location. As an example, the image below shows several bars in a long slit configuration that did not fully close to the desired slit width.

Problem
In the image above, it is suspected that a single tooth was missed (10 May 2012). Alternatively, a neighboring bar may have pushed the bar out of position.
Solution
Initialize the out of position bars then re-setup and re-execute your mask. To initialize a subset of CSU bars:
  1. If the bar is more than what looks like an alignment box away from the desired location:
    1. run m csuinitbar=# where # is the desired bar number seen on ds9.
    2. repeat for other bars.
  2. If the bar looks like a box but and is a box length from the desired location (this is quicker than above, but can cause a fatal error if the bar is far from the intended destination:
    1. Pull-down: "MOSFIRE Engineering Menu" --> "Trouble Recovery Menu" --> "CSU Bar Subset Init" calls mosfireCSUQuickInit
    2. Answer "y" to continue running the script
    3. Enter the bars to be initialized, e.g.34 35 36 37, and hit enter
    4. Once complete, use MAGMA to re-setup and re-execute your mask

If all bars must be initialized (this takes 20-80 min depending on bar current positions), then the procedure is:

  • Set the CSU to "open" mask from the MAGMA UI.
  • Reset all bars using the command
    	m csuinitbar=0 

CSU Setup Failed

Symptom
While setting up the CSU bars for a new mask, the CSU status at the top of the CSU status gui indicates an error: "Status: Setup Failed. Error setting up bar target positions." MAGMA button never reactivate and you can not proceed to configuring a mask.
Problem
CSU setup failed.
Solution
  1. modify -s mcsus startCSU=1 In a mosfire server window
  2. Re-send the CSU mask setup and execute the mask

If this fails, try stopping and restarting the keyword servers mcsus and mosfire. Then try power cycling the CSU. This worked on March 29 but it is unclear what cleared the error.

CSU Fatal Error

Symptom
While configuring the CSU bars for a new mask, the CSU status at the top of the CSU gui indicates one of the following errors:
	Status= FATAL ERROR 
	Status= FATAL ERROR and Control Rack Errors: [23:32]
For the two above errors, the CSU faulted while attempting to move bars.
Problem
The CSU electronics have faulted.
Solution
To recover, select the appropriate recovery option based on the state of the system. Determine whether you need to recalibrate all of the CSU bars or just a few. Performing a subset init on a few bars can be fast, completing a Edge detect init takes longer, and to run a full init may take over an hour. These options are described below.
  1. Subset Init is appropriate in situations when only a few bars were being moved when CSU fatal error occurred, such as:
    • moving from an alignment mask to a science mask;
    • moving from a science mask to a small longslit mask;
    • reconfiguring back to an alignment mask following a MIRA.
    Steps to follow for a Subset Init:
    1. Determine the index numbers of all bars requiring recalibration. These numbers are shown in the CSU status GUI (even bars on the left, odd bars on the right) and are also displayed as green overlays on direct images displayed in ds9.
    2. From the background menu, select
      	MOSFIRE Engineering Menu -> MOSFIRE Trouble Recovery Menu -> CSU Bar Subset Init
    3. Enter a list of the bar numbers to initialize.
    4. Wait for the re-initialization process to complete. This should take about one minute per bar.
    5. Setup and execute your next mask.
  2. Edge detect init is appropriate in situations when most of the bars were moving when CSU fatal error occurred, including:
    • reconfiguring from one mask to another;
    • configuring to open mask;
    • configuring to large longslit.
    Follow these steps to complete a Kassis Init:
    1. From the pull down menu select MOSFIRE Engineering menu -> MOSFIRE Trouble Recovery Menu -> CSU:ID bar positions. This will:
      • launch an xterm running IDL and start recovery script;
      • configure MOSFIRE for imaging;
      • acquire a direct image of the slits;
      • display measured bar positions on ds9 image display utility;
      • measure the bar positions; NOTE: if the positions of some bars cannot be determined, then you must perform a Full Init as described below.
      • launch a second xterm and execute the CSU power cycle script ;
      • prompt user to continue;
      • halt the CSU (modify -s mcsus stopCSU=1)
      • power off the CSU drives and controller;
      • execute showpower to verify that power is now off;
      • pause for 10 sec;
      • power on the CSU drives and controller;
      • start the CSU (start_csu_cold) and exit second xterm;
      • wait for user to enter Y in first xterm;
      • acquire new image of the mask;
    2. Assess whether the actual bar positions visible in ds9 agree with the predicted bar positions slits marked in green. If not, then abandon the Kassis Init method and perform a Full Init instead.
    3. Setup and execute an OPEN mask and wait for bar moves to complete.
    4. Perform a full recovery of the CSU bars via
      	modify -s mosfire csuinitbar=0
    5. Wait for the CSU to complete the initialization by monitoring the CSU status gui completion bar. Note: do not try to send setup files or move until the initialization process is completed.
    6. When finished, setup and execute the desired next mask.
  3. Full init is appropriate when most of the bars require recalibration. This re-intializes all bars but typically takes over an hour to complete. If a full recovery is necessary:
    1. Cycle power to the CSU from the background menu via:
      	MOSFIRE Engineering Menu -> MOSFIRE Troubleshooting Menu -> Power Cycle CSU
    2. Perform a full recovery of the CSU bars via:
      	modify -s mosfire csuinitbar=0
    3. Wait for the CSU to complete the initialization by monitoring the CSU status gui completion bar. Note: do not try to send setup files or move until the initialization process is completed.
    4. When finished, setup and execute the desired next mask.

Amplifier Error: ##:128

Symptom
While moving the CSU, the CSU throws an error. The CSU status indicates "Aplifier Errors: 40:128" for example. The 40 is the bar number and may be 1-92.
Problem
Suspect that the electronics for the CSU are too cold. All instances of amplifier errors resulted when the amplifier boards for the bar clutches and brakes were too cold ( between 2-13C ). There may be other reasons for the amplifier error, but those are not yet known.
Solution
So if we suspect that the temperatures are too cold, issue "cabtemps" on the mosfire server. The temperature to note is the "Between CSU chassis." If it is below 13 C, you may suspect a temperature sensitive amp board. Typical output looks like this:
[61] mosfire@mosfireserver: cabtemps
      
                 MOSFIRE Cabinet Temperatures (K)         
     ===========================================================
     Sensor      Location                                Temp   
     -----------------------------------------------------------
     exttmp1     Air Return                              12.62
     exttmp2     Between CSU Chassis                     18.20
     exttmp3     Middle Right Back of Cabinet            16.33
     exttmp4     TBD                                     n/a
     exttmp5     Dewar Inner Window                      15.67
     ===========================================================
                  Log File    
     -----------------------------------------------------------
     /kroot/data/mosfire/logs/housekeeping/160127_mdhs.log
     Logging is on
      
      
                 MOSFIRE Glycol Status                  
     ===========================================================
     Keyword            What                     Status         
     -----------------------------------------------------------
     glysupflow         Supply                   Flow
     glyretflow         Return                   Normal
     ===========================================================
                  Log File    
     -----------------------------------------------------------
     /kroot/data/mosfire/logs/housekeeping/160127_mdhs.log
     Logging is on
    So what do we do?
  1. Ask the OA to send the telescope to horizon and point it such that you can get to Cass.
  2. Send MOSFIRE to stationary 0 drive angle
  3. On the telescope adjust the glycol flow meter from full open to 0.4 gpm.
  4. Wait for the temperautre for "Between CSU Chassis" to rise above 13C.
  5. Follow standard recovery methods for the CSU. Note that you may only need to initialize the failed board.
  6. please be sure to indicate in the nightlog entry which bar had the amp error so that we can track down which amplifier board may be at fault.

The two photos below show where and what to adjust. The glycol flow meter is located on the left hand side of the instrument when the instrument is at PA=0 stationary mode. The second photo is a view of the flow meter and brass needle valve. The image shows the setting for full open with the red line all the way to the the bottom, which is marked open. Adjust the needle valve until the red line falls to the 0.4 gpm which is marked on the flow meter display.

Image of back of the electronics cabinet. Flow meter located on the bottom left. Click to enlarge image.

Image of the flow meter and brass needle valve. Click to enlarge image.

Temperatures

Dewar Inner Window temperature out of range

Symptom
testAll and the e-mails sent by crons suggest that the inner window temperature is out of range.
Problem
If the temperature is too low, then the heater power may not be on. Check this by showing the value of the keyword.
	show -s mdhs volt1 

If it reads ~0 Volts then the power is not on. If it is ~20 Volts the power is on. There is no in between. 20 or 0. That is it.

If the power is 0, then it is likely that the Kepco power switch in th electronics cabinet is toggled off, and you will need to manually reset it.

Check the plots of the Dewar Window Temps and Window Heater Voltage which will help you confirm that it was toggled off.

Solution
Turn the Kepco switch to the on position.
  1. Rotate MOSFIRE if necessary to the on deck park position. This has the compass rose on the back of the instrument pointing up and to the right.
  2. open the right bay door.
  3. find the panel/tray that says Kepco power supply. This tray also has the MAGIQ guider electronics.
  4. Check the position of the "Kepco Crowbar Reset switch" located on the right hand side.
  5. toggle the switch to the on position.
  6. Now run
    	show -s mdhs volt1 
    and check that volt1=20 V.
  7. run cabtemps and monitor the window temp. You should see it start to return to the operating range.