10/15/99 Owl shift summary

Valeri Jejer (jejer@uvahea.phys.virginia.edu)
Fri, 15 Oct 1999 08:23:35 -0400 (EDT)

Shift Summary for owl shift, 15/Oct/99

On Shift: Ruth, Valeri.

Hours of beam available at all intensities: 6.8
Hours of beam available at nominal intensity: 6.8
Hours of E832 data (special runs not included): 5.2
Hours of special runs (laser scans, muon runs...): 0.1

Accounting of lost time:

Almost all DA problems you can imagine.

Runs ended this shift:
Run 14839 - 23.3 M evts - BEAM_799
Run 14840 - 0.32 M evts - BEAM_MU_4E12
Run 14842 - 3.75 M evts - BEAM_799

Tapes brought to FCC this shift:

(including tapes for online split)

Accesses during shift:


Overview of shift:

Lot of problems first 2 hours, then smooth datataking.
Description of Shift:

run spill time comment
14839 00:00 DAQ monitor is hanging. Ruth tries snapshot but
shows up in logs.
00:30 Arun takes the following onsplit tapes to Feynman:
UPM015-16, UPK009, UPF018, UPH010.
00:40 l3#stat shows that 23.3 M events are logged to
Decide to try to stop the run. Two spills after
STOP button triggers are still running, although the
run stop widdow popped up and shows some activity
murmur window keeps updating usual errors. Stop
00:50 The run is still in process of stopping, we decide
to pull the plug - do fresh start.
14840 01:10 After changing tapes, fresh start and calling MCR to
put us in muon mode init muon run without flipping
any magnets.
01:15 Start muon run.
1 01:16 Missing FFFF in crate 53 -> kfrend. Do another
just to check. now output shows up in
~da_ktev/dart/log target_unix_ktevN...log
files. Check ktev2_drv2 % device used matches more
less to ktev2(5) on damp display - but everything
fine at the moment anyway.
4 01:20 DDD veto from ktev1 and ktev3 -> kfrend.
01:22 Stop muon run, call MCR to go back to e799 mode.
ktev4 memory lights kept flashing for a couple more
minutes after trigger stopped.
14841 01:30 We are in 799 mode, init new run.
01:35 Start run with missing FFFF in crate 53 -> kfrend.
01:36 Lost permit in stream5 -> kfrend -> 2 errors from
latches in crate 33 sl. 8 and 9 -> kfrend.
01:41 Triggers are running, but histograms are not
Murmur shows the following errors in every CPU:
"DFM_W_GROUP_IS_EMPTY No providers in service group
[L3ServiceGroup] ... dfm_schedule_list failed"
We don't know what does this mean, but suspect that
something bad and decide to do fresh start again.
14842 01:45 Init new run after fresh start.
01:50 Pipeline init error in crate 8. Resetting pipeline
w/DBC does not fix it. Reboot crate 8 and reset
pipeline again. OK now.
1 01:58 Start run with missing FFFF in crate 53 followed by
2 01:59 Trigger is in error state -> kfrend.
3 02:00 Lost permit in stream 5 -> kfrend.
4 02:01 Pipeline errors - reset pipeline + kfrend.
5 02:03 Lots of "Hardware cluster seed has energy less..."
errors and MF11 sector on the laser illumination
plot is black while the rest of CsI is blue. Gone
next spill.
02:15 Lot of CsI channels with large gain drift. Does not
seem to get better or worse. Val's alarms go off
every other spill.
02:40 Finally decided to page Julie, she responds within
a few seconds and is looking into the gain drift
02:48 DAQ_DEAD 9.55 sec for unknown reason.
03:07 Trying to disable CSI gain drift alarms for a while
noticed Main Alarm screen is absent - may not have
there since the start of the run.
03:25 DCAlign: DC 2X is off ~50 um. DC4X is off ~70 um and
rotated ~20urad.
Ed called and said that gains are OK, but reference
plot has not been updated for ~12 days.
03:30 Julie shows up and fixes CsI gain problem.
03:45 Stop the run to update CsI gain.
14843 03:49 Start new run with no ktevana errors but screens 2
5 (DC and filter status) didn't come up. Restarted
them manually.
03:55 DYC-53 busy -> kfrend.
03:57 Missing FFFF word in crate 53 -> kfrend.
05:05-05:12 No beam.
04:32 and 6:30 got TRD alarms caused by SDAQ check_servers
process running from SDAQ v6_04 instead of v6_05.
Labelled tapes.
Follow up Notes on Previous Shift

Items to Follow up on Next Shift

Do drt_fresh_start after stopping 14843.

If DAQ monitoring dies, try to save the system "snapshot".

If down time > 1 hr. do Laser scan and page JasonL.