Operational Readiness Review Checklist for Go Live
A go-live should feel predictable, not exciting. An Operational Readiness Review (ORR) is the final check that a ground station, service, or new mission integration is ready to run safely, meet expectations, and recover from common failures. This checklist is designed to be practical: it focuses on what teams actually need before they start running real passes with real consequences.
Table of contents
- What an Operational Readiness Review Is
- Scope and Success Criteria
- Site and Facilities Readiness
- RF Chain and Antenna Readiness
- Timing, Frequency, and Synchronization
- Software Systems and Configuration Control
- Networking, Backhaul, and Data Delivery
- Security, Access, and Change Management
- Operations Procedures and Training
- Monitoring, Alerting, and On-Call
- Testing, Acceptance, and Evidence
- Launch-Day Plan and Post-Go-Live Stabilization
- Glossary: ORR Terms
What an Operational Readiness Review Is
An ORR is a structured review that confirms your team can operate a system reliably, not just that it works in a test. It is the bridge between engineering completion and day-to-day operations. A good ORR catches gaps that are easy to miss during buildout, like missing spares, unclear escalation paths, or monitoring that does not reflect real mission risk.
The ORR should end with a clear decision: go, go with constraints, or no-go, along with the actions needed to close any open items.
Scope and Success Criteria
Start by defining what is included in the go-live. A ground station may support multiple missions, multiple bands, and multiple operational modes. An ORR is easier when scope is explicit.
- Mission scope: which spacecraft, which services (TT&C, payload downlink, gateway), which bands.
- Operational scope: planned hours, staffing model, manual vs assisted vs lights-out.
- Service commitments: pass success targets, delivery timelines, reporting requirements.
- Acceptance definition: what success looks like for a “good pass” and a “good delivery.”
It helps to document a small set of measurable success criteria. Examples include minimum contact success rate, minimum delivery completeness, and maximum acceptable time to restore service after a typical failure.
Site and Facilities Readiness
Facilities issues can quietly break operations. Even a perfectly configured RF chain can fail if the site cannot maintain stable power, protect equipment, or support safe access.
- Power: UPS health, generator readiness, fuel plan, and tested transfer behavior.
- Environmental control: temperature and humidity within equipment limits, ventilation and filtration.
- Physical security: controlled access, logging of entry, and secure equipment rooms.
- Safety: clear site safety rules, antenna movement safety, and lockout procedures where needed.
- Spare parts storage: protected storage location with an inventory list.
- Site access plan: how technicians reach the site under normal and emergency conditions.
If your station depends on remote hands, confirm response times, availability windows, and the exact scope of what they can do without special approvals.
RF Chain and Antenna Readiness
Go-live requires confidence that the station can acquire, track, and maintain links across expected pass conditions. This includes both the physical antenna system and the RF components behind it.
- Antenna pointing: verified pointing model, recent calibration, and repeatable acquisition behavior.
- Tracking modes: tested program track and any signal-based tracking used in operations.
- RF chain health: converters, amplifiers, filters, and cabling verified with expected signal levels.
- Polarization: alignment checked, cross-pol isolation verified where applicable.
- Reference measurements: baseline noise floor and typical carrier levels recorded for comparison.
- Spare strategy: identified critical spares and documented swap procedures.
Include evidence from representative passes: short passes, low-elevation passes, and high-rate downlinks if those are in scope.
Timing, Frequency, and Synchronization
Timing and frequency issues can look like “random modem instability” unless they are explicitly tested. A station should have a clear, monitored reference and a plan for how systems behave if that reference degrades.
- Reference availability: stable frequency reference distributed to dependent equipment.
- Holdover expectations: how long systems remain within acceptable stability if the reference input is lost.
- Time sync: consistent timestamps across logs, pass records, and data products.
- Monitoring: alarms for reference loss, drift, and out-of-tolerance behavior.
Confirm that timing issues are visible in monitoring and that operators know what actions to take when alarms occur.
Software Systems and Configuration Control
Go-live often fails due to software drift: the station works on one machine, but not on another, or the “known good” configuration is not actually reproducible. Treat configuration as an operational asset.
- Configuration baselines: documented station profiles for each mission and mode.
- Version control: tracked changes for automation rules, profiles, and scripts.
- Release process: how updates are tested, approved, and rolled back.
- Access boundaries: who can change what, and how changes are reviewed.
- Timeboxed freezes: change freeze periods around go-live and critical events.
If multiple stations or environments exist, confirm they are consistent enough that operators can switch between them without re-learning basic workflows.
Networking, Backhaul, and Data Delivery
Many missions define success as “data delivered,” not “carrier locked.” The ORR should include end-to-end validation: capture to storage to delivery to the receiving system, with clear handling of retries and failures.
- Backhaul capacity: enough throughput for peak downlink volumes.
- Backhaul resilience: alternate paths or fallback behavior if the primary link fails.
- Delivery workflow: clear steps from raw capture to packaged products.
- Integrity checks: checksums, completeness checks, and validation of file counts.
- Latency expectations: defined targets for “available to operators” and “delivered to end users.”
- Failure handling: retry logic with limits, queueing, and operator alerts for stuck deliveries.
If your delivery depends on downstream systems you do not control, confirm the handoff is tested and the receiving side can confirm receipt.
Security, Access, and Change Management
Security readiness is not just about tools. It is about preventing avoidable incidents and ensuring accountability when something goes wrong. Controls should match the station’s actual risk profile, especially if the station can transmit.
- Account model: individual accounts, role-based access, and removal of shared credentials.
- Authentication: strong authentication for remote access and privileged actions.
- Segmentation: separation between control systems, operator systems, and external connectivity.
- Logging: audit logs for logins, configuration changes, and key operational actions.
- Transmission controls: clear enablement process and interlocks for uplink paths.
- Change management: documented process, approvals for sensitive changes, rollback plans.
Confirm that security controls do not block urgent operational actions. When controls get bypassed under pressure, it usually means the design does not fit the job.
Operations Procedures and Training
Procedures are how you turn a working system into a reliable service. Operators need clear, tested runbooks for normal operations and for common anomalies. Training should include both “how to run a pass” and “how to handle surprises.”
- Standard operating procedures: pass execution steps, verification steps, and reporting steps.
- Anomaly runbooks: late acquisition, weak signal, lock instability, backhaul outage, and storage pressure.
- Escalation paths: who to call, when to call, and what information to provide.
- Shift handover: a consistent handoff format for open issues and upcoming critical passes.
- Training evidence: a record that operators practiced the core workflows and exception handling.
When possible, practice the go-live flow during a rehearsal window that mirrors real conditions, including timing, staffing, and expected pass volume.
Monitoring, Alerting, and On-Call
Monitoring is part of the product. A station can be “up” while silently failing to deliver data. The ORR should confirm that monitoring measures what matters and alerts are actionable.
- Health monitoring: power, network, storage, services, timing reference, and RF chain signals.
- Workflow monitoring: pass started, acquisition achieved, data captured, validation passed, delivery completed.
- Alert quality: alerts include context and recommended actions, not just “something is wrong.”
- On-call coverage: defined schedule, response expectations, and backup contacts.
- Escalation rules: when an alert is a warning vs an incident requiring immediate action.
A practical test is to simulate a few failures (like backhaul loss or service crash) and confirm that alerts trigger quickly and guide the responder to the right decision.
Testing, Acceptance, and Evidence
An ORR should produce clear evidence that the system meets go-live criteria. Evidence reduces debate during go-live and makes future troubleshooting faster.
- Acceptance tests: defined tests for acquisition, downlink, delivery, and reporting.
- Representative conditions: tests include low elevation, typical data rates, and expected interference conditions.
- Performance baselines: expected link metrics and typical ranges captured and documented.
- Recovery tests: at least one restore or rollback path proven in practice.
- Open items list: known gaps documented with owners and due dates.
If you cannot produce evidence for a critical requirement, treat it as a risk to address before go-live or explicitly accept with constraints.
Launch-Day Plan and Post-Go-Live Stabilization
A go-live plan reduces uncertainty on the day you switch from test to production. It also helps avoid risky last-minute changes. A good plan defines roles, communications, and fallback options.
- Go-live window: start time, end time, and which passes are “in scope” for the cutover.
- Roles: who is operating, who is approving changes, who is monitoring.
- Communication plan: what updates are shared, how often, and who needs them.
- Change freeze: limit changes during the go-live window except for pre-approved fixes.
- Fallback plan: what happens if the go-live fails, including rollback and alternate coverage options.
- Stabilization period: a defined period after go-live to prioritize reliability fixes and process improvements.
Stabilization is where reliability is earned. Plan time to review early incidents, tune monitoring, and update runbooks based on real operations.
Glossary: ORR Terms
Operational Readiness Review (ORR)
A structured review that confirms a system and team are ready to operate reliably in production.
Go-live
The point when a system transitions from testing to production operations with real mission impact.
Acceptance criteria
The measurable requirements that must be met for a go-live decision.
Runbook
A step-by-step guide for operating a system and responding to common anomalies and incidents.
Change freeze
A planned period when changes are limited to reduce risk during critical operational windows.
Escalation
The process of involving additional responders or decision-makers when an issue exceeds normal operator handling.
Stabilization period
A planned period after go-live focused on reliability improvements and operational tuning based on early production behavior.