Troubleshooting¶
Autoreduction¶
The autoreduction system is a complex system that can fail for many reasons. This section provides a list of common issues and their solutions.
The general entry point for an autoreduction fail is a reported error in monitor.sns.gov.
Usually, the error message is very succinct or is trimmed. For the complete error trace, look at the error log file
/SNS/REF_M/IPTS-XXXX/shared/autoreduce/reduction_log/REF_M_YYYYY.nxs.h5.err
This file, along with its .log counterpart, is created for each autoreduction run by the
post_processing_agent.
One can try to manually re-run the autoreduction script with the same arguments to see if the error is reproducible. For instance, to reduce run 43834, save all output to a temporary directory, and prevent the HTML report to be uploaded to the livedata server, run:
$ cd /SNS/REF_M/shared/autoreduce/
$ mkdir test_20250123
$ cp reduce_REF_M.py test_20250123/
$ cd test_20250123/
$ mkdir output
$ pixi shell --manifest-path /usr/local/pixi/mr_reduction # or mr_reduction-dev
(mr_reduction)
$ python reduce_REF_M.py /SNS/REF_M/IPTS-34262/nexus/REF_M_43834.nxs.h5 ./output --no_publish
For an explanation of the autoreduction script arguments, type:
(mr_reduction)
$ python reduce_REF_M.py --help
If a debugging session proves necessary,
you can use an IDE like PyCharm or VSCode to run the autoreduction script
while having the ability to set breakpoints whithin the modules of package mr_reduction,
even if you have read-only access.
This is the scenario if debugging in one of the analysis machines with pixi environment
/usr/local/pixi/mr_reduction-dev/.pixi/envs/default/lib/python3.11/site-packages/mr_reduction.
Alternatively, you can set up your own mr_reduction pixi environment in your home directory
so that you can edit the modules and introduce pdb.set_trace() statements.
Live Reduction¶
The autoreduction system is a complex system that can fail for many reasons. This section provides a list of common issues and their solutions.
The general entry point for a livereduction fail is the inability to show reduction results in monitor.sns.gov, like shown in the following screenshot:
There is no error message in this particular case, therefore there are few things to check:
Logs:
/SNS/REF_M/shared/livereduce/REF_M_live_reduction.log/var/log/SNS_applications/livereduce.login serverbl4a-livereduce.sns.gov.
Service:
> sudo systemctl status livereduce
● livereduce.service - Live processing service
Loaded: loaded (/usr/lib/systemd/system/livereduce.service; enabled; preset: disabled)
Active: active (running) since Thu 2025-04-24 09:40:09 EDT; 1h 30min ago
Main PID: 3797548 (livereduce.sh)
Tasks: 15 (limit: 151899)
Memory: 558.9M
CPU: 12.789s
CGroup: /system.slice/livereduce.service
├─3797548 /usr/bin/bash /usr/bin/livereduce.sh
└─3797757 python3 /usr/bin/livereduce.py
Service processes, which are owned by user snsdata:
> ps -u snsdata -o pid,etime,stat,command
PID ELAPSED STAT COMMAND
3797548 01:33:13 Ss /usr/bin/bash /usr/bin/livereduce.sh
3797757 01:33:13 Sl python3 /usr/bin/livereduce.py
Red Herring: dozens of log of entries “Run paused”, “Run resumed”¶
You may see dozens of log entries like the following in the span of one or two seconds:
2025-04-24 09:40:13,205 - Mantid - INFO - Scan Stop: 46
2025-04-24 09:40:13,206 - Mantid - INFO - Annotation: [Run 44326] Scan #46 Stopped.
2025-04-24 09:40:13,207 - Mantid - INFO - Run paused
2025-04-24 09:40:13,207 - Mantid - INFO - Annotation: Run 44326 Paused.
2025-04-24 09:40:13,209 - Mantid - INFO - New peak: 139 151
2025-04-24 09:40:13,212 - Mantid - INFO - Run paused
2025-04-24 09:40:13,212 - Mantid - INFO - Annotation: [NEW RUN FILE CONTINUATION] Run 44326 Paused.
2025-04-24 09:40:13,216 - Mantid - INFO - Run resumed
2025-04-24 09:40:13,216 - Mantid - INFO - Annotation: Run 44326 Resumed.
2025-04-24 09:40:13,216 - Mantid - INFO - Scan Start: 47
2025-04-24 09:40:13,216 - Mantid - INFO - Annotation: [Run 44326] Scan #47 Started.
These don’t indicate a problem with the live reduction, but a “rocking curve” procedure performed by the instrument scientists when they do an alignment scan or when they measure with a polarized beam. Each pause will match with a sample position change or a spin state change.