Production readiness checklist: ensuring smooth deployments

March 13, 2025

Ready to start?

Production readiness checklist: ensuring smooth deployments

Editor's note: This post was updated 13 March 2025 to include new relevant information.

What is production readiness?

The idea of production readiness originates from Google’s SRE book. Production readiness is a software engineering process that ensures that specific software components meet the security, reliability, and performance standards that will provide the best possible experience for users. This can include things like ensuring software components have been end-to-end tested, production environments can access additional disk space for high user loads, or even that a public API endpoint is exposing only what is necessary for the user to see. 

Performing production readiness checks before a release can help reduce the chances of downtime, minimize the number of critical incidents or failures, and provide users with a better experience. But production readiness is also a continuous process, one that helps SREs and other developers monitor live production environments and quickly address service degradations. Production readiness also comes into play as the needs of the organization’s production engineering environment change and mature.

Both the pre- and post-release processes rely on numerous factors, each unique to individual engineering organizations, that all play a role in making software production-ready. 

Core components of a production readiness checklist

Production readiness should touch on everything that happens in engineering, from writing code to managing post-production ops. Its holistic nature and requirements call for a high attention to detail and alignment across multiple teams within the engineering organization.

Among others, a few common standards for production readiness are:

  • Peer reviewing code
  • Testing code thoroughly 
  • Setting up monitoring
  • Implementing security and access controls
  • Including documentation with each release or update
  • Following the correct deployment workflows

Examples of production readiness standards

If you want to build a comprehensive production readiness checklist for a service that addresses multiple factors, you’ll want to include multiple types of checks for each area of concern. Below, we’ll list some examples of checks you’ll want to perform in each area.

  • Security: 
    • Ensure you are connected to and able to conduct vulnerability scans.
    • Identify vulnerabilities through a security audit.
    • Ensure you have SLOs and maximums set for vulnerabilities.
    • Put role-based access controls in place.
    • Ensure authentication and authorization methods are in place for each service.
    • Make sure secrets are properly managed.
    • Static application security testing (SAST): Using tools like Snyk to monitor code in the CI/CD pipeline. 
    • Perform penetration tests and dynamic application security testing (DAST) at the appropriate times. 
    • Check all dependencies are using the correct versions using scanning tools.
    • Implement data encryption, for both data at rest and in transit.
    • Verify compliance with industry security standards.
    • Checks for other common malicious activities.
  • Scalability:
    • Ensure the architecture is designed to handle increased loads efficiently.
    • Stress-test the application’s components to check their rate limits.
    • Check whether your application can handle user or data growth.
    • Monitor performance against your SLOs, with attention to detail on any specific contract details with your clients or vendors.
    • Establish performance benchmarks, and then check these are met.
    • Automate the CI/CD release process to enhance scalability.
    • Execute automated unit and integration tests that require passing a set threshold.
  • Reliability:
    • Define and monitor compliance with service-level objectives (SLOs), service-level indicators (SLIs) and service-level agreements (SLAs).
    • Ensure disaster recovery plans are documented and tested.
    • Keep regular backups of data that are able to be rolled back if needed.
    • Ensure redundancy mechanisms are in place to prevent downtime.
    • Include automated rollback capabilities to revert to a stable version if needed.
  • Observability:
    • Implement monitoring with comprehensive KPI and health metrics, logging, and tracing.
    • Ensure you are alerted via your preferred method (Slack, email, etc.) when the status of your services change through broken thresholds or inconsistencies.
    • Use dashboards for real-time status updates and insights.
    • Use logging for incidents and errors.
  • Ownership:
    • Identify owners of services and components, and include easily discoverable contact information and methods for them should their services fail during off hours.
    • Map upstream and downstream dependencies.
    • Identify and make discoverable related teams, stakeholders, and team members.
  • Incident management:
    • Ensure runbooks have been documented and are accessible.
    • Assign on-call responsibilities for incidents.
    • Designate owning teams for each service.
    • Establish escalation policies.
    • Test incident response process with a drill, including members of your team.
    • Ensure on-call is able to find the information they need easily during resolution.

Addressing these areas before, during, and after software is deployed will ensure that your software is production-ready, capable of meeting user demands and maintaining reliability throughout its lifecycle.

It’s important to note that not all services need to track every metric listed; some may include fewer if the service is not customer-facing, while additional metrics might be needed for specific situations, like FinOps, or meeting specific Kubernetes standards or application security standards, which aren't always part of SRE activities.

Production readiness vs. product readiness

Though production readiness is an extrapolation of “product readiness,” they are separate terms with different definitions. Product readiness is an earlier stage in a similar evaluation process that occurs within the product management team. Sometimes called the “definition of done,” product readiness usually focuses on making a feature or software component as complete as possible before they are “ready” for release.

But nothing in the software world is ever truly done — once something is deployed, it must be continuously and consistently reviewed, monitored, and maintained. This is where the concepts diverge — as discussed in the first section, software may be ready for release, but it must then be maintained and monitored while in production. By contrast, product development will conclude its initial phase before a feature is deployed, and any subsequent updates or changes to it are included in a separate, contained process. 

For example, if a service was production-ready when it was scaffolded, it won’t necessarily remain that way because requirements change and services (and their components) can degrade over time. 

Importance of a production readiness checklist

A production readiness checklist is exactly what it sounds like — a ready-made list of everything that you need to check about your software for production readiness.

Ensuring that software is production-ready is closely tied to software standardization: together, they comprise the necessary steps to ensure smooth operation in a live environment. There are different ways to actually ensure the review of the production readiness checklist:

  • A DevOps engineer that is performing an action (for instance, when scaffolding a service) can manually review production readiness.
  • A developer writing code can contribute to production readiness by ensuring that the feature is properly documented.
  • Using manual lists, stored in Excel sheets, Jira, or Confluence.
  • Using automated checks, such as using scorecards or self-service actions in an internal developer portal.

Though reviewing and maintaining lists like these may seem easy, it involves significant manual work. The 2025 State of Internal Developer Portals revealed that just 6% of engineers update software asset metadata such as documentation and proper monitoring setups on a daily basis, while over 40% of developers only do so once a week. 

If developers are already avoiding tedious manual work like this in existing environments, it presents an issue when more work is needed to maintain production-ready services that have already been deployed. If teams are struggling to keep up with or follow existing guidelines for new software projects, their workload will only compound when issues arise after deployment.

A software outage or breach in production could have a hugely detrimental impact on your business’ reputation and bottom line. Having a clear, consistent production readiness checklist is important because it means that your engineering teams have released high-quality software: the service is resilient, secure, and performant.

If these guidelines aren’t followed, you can also see negative consequences arise internally. Backlogs and tech debt can rise as services degrade, and if your teams are focused on keeping existing systems afloat, they will have less available bandwidth to create new services or features, or improve processes.

Getting started with a production readiness checklist 

Using a production readiness checklist is one way you can improve your end-user experience, which will help you retain (and grow) customer trust and revenue. Before we get into what your list should include, let’s cover what the checklist should and should not be:

The checklist should not be static‍

The software development lifecycle is continually evolving, and any new frameworks, languages, dependencies, and technologies must be factored into your checks. Review and refine your existing checklist ahead of time — just like the definition of done in product management, what is on your production readiness checklist should be reviewed as the first steps of your production readiness check. 

Even if you’ve ensured there are no bugs or vulnerabilities in your code before deployment, you’ll need to enable tools or other checks that will alert you when new vulnerabilities appear in production, and ensure that there is a clear, documented approach to resolving such vulnerabilities embedded in the organization’s standards. This is where ongoing review of production readiness comes into place.

Automated production readiness checks should not be skipped

While the whole idea of using a checklist sounds like a manual approach in itself, you can automate many aspects of the list to improve the efficiency and accuracy. The only true manual task in the checklist is compiling the list itself and verifying that it makes sense with all stakeholders.

When it comes to conducting the checks themselves, each organization will have their own approach but it’s clear that manual checks — using spreadsheets, project management software, or Configuration Management Databases (CMDBs) — are inefficient and may not be up-to-date, which can subsequently hinder the trust that engineers have in the process. 

The automated aspect also enables these checks to go a step further; providing alerts when issues arise, and then enforcing policies, triggering tests and validating configurations; all providing a more efficient and reliable process. 

No one checklist can be easily reused

Creating a production readiness checklist is challenging due to the diverse requirements of different software components (e.g., APIs vs. microservices). These standards vary based on numerous factors, including the infrastructure, underlying technology, and the role of each component within the overall engineering ecosystem.

Each organization requires its own set of production readiness metrics and checklists tailored to its unique business needs and technical environments. Consider that a highly regulated industry handling sensitive data, such as fintech software, may require much more stringent security policies, while an externally exposed service will need to prescribe specific Kubernetes standards.

Where to store the production readiness checklist

Where you store your checklist matters! Make sure to share, publicize, and encourage use of your production readiness checks — if you don’t, it may impact how easy it is to find, use, update, and even delete if necessary. Make them usable, findable, and encourage your teams to regularly suggest improvements to keep them helpful.

One common way that companies  store their checklist is inside the related GitHub repo as a Markdown file. The benefit of this is that it is in the same space as the code to which it applies, and won’t get lost.

But the downside is that it might not be as easily updated or findable for those unfamiliar with the service, which makes it less valuable in on-call scenarios. Alternatives include spreadsheets, which, just like the checks themselves, can be a painstaking exercise to use and manually update.

Another option is to use an internal developer portal, which allows SREs and others to automate checks with scorecards, which detail what makes a piece of software production-ready, monitor and validate readiness criteria on a continuous basis, and can consistently perform checks without human error. Portals also offer initiatives, which are built-in management tools that can help you push change in your organization: if teams are not using scorecards, you can set an initiative to create them for each service and monitor adoption using dashboards. If services don’t meet the scorecard requirements, teams will not be able to deploy; if your existing services degrade, you can similarly opt to receive a notification alerting you to the service, its degradation, and the owning team.

Key takeaways

In conclusion, a production readiness checklist is essential for guaranteeing that your services are secure, scalable, reliable, and observable. It also plays a critical role in implementing continuous integration and deployment (CI/CD), setting service level objectives (SLOs), and establishing robust disaster recovery and rollback plans. Incorporating these elements from the initial launch and throughout subsequent updates ensures the ongoing health and effectiveness of your services.

Learn how you can manage production readiness in an internal developer portal in this guide

{{roadmap}}

Free Roadmap planner for Platform Engineering teams

  • Set Clear Goals for Your Portal

  • Define Features and Milestones

  • Stay Aligned and Keep Moving Forward

{{rfp}}

Free RFP template for Internal Developer Portal

Creating an RFP for an internal developer portal doesn’t have to be complex. Our template gives you a streamlined path to start strong and ensure you’re covering all the key details.

{{ai_jq}}

Leverage AI to generate optimized JQ commands

test them in real-time, and refine your approach instantly. This powerful tool lets you experiment, troubleshoot, and fine-tune your queries—taking your development workflow to the next level.

{{cta_1}}

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

{{cta_survey}}

Check out the 2025 State of Internal Developer Portals report

See the full report

No email required

{{cta_2}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_3}}

Open a free Port account. No credit card required

Let’s start
{{cta_4}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta_5}}

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start
{{cta_6}}

Contact sales for a technical walkthrough of Port

Let’s start
{{cta_7}}

Open a free Port account. No credit card required

Let’s start
{{cta_8}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta-demo}}
{{reading-box-backstage-vs-port}}
{{cta-backstage-docs-button}}

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}
{{tabel-1}}

Microservices SDLC

  • Scaffold a new microservice

  • Deploy (canary or blue-green)

  • Feature flagging

  • Revert

  • Lock deployments

  • Add Secret

  • Force merge pull request (skip tests on crises)

  • Add environment variable to service

  • Add IaC to the service

  • Upgrade package version

Development environments

  • Spin up a developer environment for 5 days

  • ETL mock data to environment

  • Invite developer to the environment

  • Extend TTL by 3 days

Cloud resources

  • Provision a cloud resource

  • Modify a cloud resource

  • Get permissions to access cloud resource

SRE actions

  • Update pod count

  • Update auto-scaling group

  • Execute incident response runbook automation

Data Engineering

  • Add / Remove / Update Column to table

  • Run Airflow DAG

  • Duplicate table

Backoffice

  • Change customer configuration

  • Update customer software version

  • Upgrade - Downgrade plan tier

  • Create - Delete customer

Machine learning actions

  • Train model

  • Pre-process dataset

  • Deploy

  • A/B testing traffic route

  • Revert

  • Spin up remote Jupyter notebook

{{tabel-2}}

Engineering tools

  • Observability

  • Tasks management

  • CI/CD

  • On-Call management

  • Troubleshooting tools

  • DevSecOps

  • Runbooks

Infrastructure

  • Cloud Resources

  • K8S

  • Containers & Serverless

  • IaC

  • Databases

  • Environments

  • Regions

Software and more

  • Microservices

  • Docker Images

  • Docs

  • APIs

  • 3rd parties

  • Runbooks

  • Cron jobs

Starting with Port is simple, fast and free.

Let’s start