Cluster mode is still a practical option when you need predictable cost and simple operations. The key is disciplined process management, graceful shutdown, and repeatable release scripts.
1) Use an explicit PM2 ecosystem file
2) Graceful shutdown is mandatory
If your service does not stop cleanly, PM2 reload can still drop connections. Listen for SIGINT/SIGTERM and close HTTP server before exit.
3) Keep Nginx config boring and explicit
4) Release script for zero-downtime deploy
5) Infrastructure baseline before first production deploy
- Harden SSH access and disable password authentication.
- Set up CloudWatch/Datadog monitoring for CPU, memory, disk, and error rates.
- Provision TLS certificates and renewal automation.
- Back up environment and deployment secrets outside the instance.
6) Release safety and rollback strategy
A rollback plan must be executable in minutes. Keep last known good artifact references and a scriptable rollback path. If health checks fail after deploy, rollback should be automatic or one command away.
7) Observability signals that matter most
- P95 and P99 latency by route.
- Non-2xx response rate over 5-minute windows.
- Process restarts and memory pressure per worker.
- Connection errors between app and dependent services.
PM2 on EC2 FAQ
Q: Should we run one big EC2 instance or multiple smaller nodes? A: Prefer multiple nodes for resilience and controlled failure domains.
Q: Is PM2 enough for all scaling needs? A: PM2 handles process-level scaling well, but infrastructure auto-scaling and load balancing still matter.
