14. Operate a reliable service
Minimise service downtime and have a plan to deal with it when it does happen.
Maximise uptime and speed of response for the online part of the service
- We have and run an automated CICD pipeline.
- We use github actions to manage deployments.
- We have several times edited our Compute power on GOV.UK PaaS to ensure that we have a fast service, and will continue to do so now we have migrated the service to AWS.
- Typically, when a performance issue appears we increase power, then refactor if we can, then decrease power again.
Be able to deploy software changes regularly, without significant downtime (for example, by minimising the effort involved in creating new environments and populating pre-production environments with test data)
- We are working in agile with two-week sprints and a prioritised backlog of features. This enables us to deploy changes regularly and in response to user needs and feedback.
- We have an automated CICD pipeline.
- We use github actions to create deployments by appliance.
- We will make releases to the production environment regularly after thorough testing in the development and test environments.
- Mixture of manual and automated testing.
- When releases are made to the production environments, there will be a short period of downtime (around 10 minutes). During this time the ‘There is a problem with the service’ page will be shown.
- We have made over 1000 deployments in public beta across all environments.
Carry out quality assurance testing regularly
- Limited automated testing in place (Unit, front end) - we are expanding this.
- Browser-driven testing.
- Penetration testing completed three times (December 2021, February 2022 and retested March 2022, and September 2023).
- All new releases and deployments are tested in the development environment before release. This ensures that bugs are identified and fixed prior to production.
Test the service in an environment that’s as similar to live as possible
- Live private beta phase ran from January to March 2022. This included a range of future users: operators, UKSA employees and orbital analysts.
- Feedback was gathered throughout this phase and used to inform the progressive improvement of the service during public beta.
- We are now in our public beta, with 12 operators using the service (as of December 2022). We continue to test deployments in the development environment thoroughly before making releases to the live environment.
Have appropriate monitoring in place, together with a proportionate, sustainable plan to respond to problems identified by monitoring (given the impact of problems on users and on government)
- We considered ‘bad actor’ involvement. The main problem we identified was one in which an analyst submits accidentally or purposefully incorrect analysis that increases the chance of a collision.
- We have mitigated against this via procuring third-party monitoring services, including:
- Logit used to log and monitor errors or system crashes.
- Piwik Pro will be used to track front-end analytics.
- Any alerts will be raised to users via a Slack channel and to UKSA administrators via email.
- There will also be a feedback form linked on the website, allowing users to report back on any issues. The service team will receive this feedback and use it to prioritise additional features, fixing bugs and post-MVP requirements.
Actively work towards fixing any organisational or contractual issues which make it difficult to maximise availability (for example, by agreeing a common set of languages, tools, and ways of working for technical staff - either informally, or through something more formal like the GDS way)
- GDS service standards used throughout project.
- Jira used to monitor tickets and tasks to be completed, with regular planning and backlog grooming sessions to prioritise and discuss tasks.
- GitHub have been used to deploy changes and store code.
- Tech spikes ran throughout where everyone involved has a joint ‘problem solve’ and agrees an approach. This ensure everyone is on the same page & knows how we plan to address the issue.
- No known further issues.
This page was last reviewed on 7 December 2022.
It needs to be reviewed again on 7 June 2023
.
This page was set to be reviewed before 7 June 2023.
This might mean the content is out of date.