🧤

Disaster Recovery

 
Our disaster recovery system is meticulously designed to restore full operation within a reasonable period, as specified in the Service Level Agreement (SLA). In our unwavering commitment to ensuring uninterrupted service, we have implemented a range of robust protections against disasters and service disruptions. These protections include High Availability (HA) components and Hot/Cold Standby provisions, designed to maintain service continuity even in the face of unforeseen circumstances.
Datrics, is equipped with HA components that offer an added layer of resilience against service interruptions. This is a key feature that enhances the robustness of our system, allowing it to maintain functionality even when individual components fail.
In addition to this, we leverage the benefits of cloud databases and storages. These modern systems offer a level of reliability and disaster recovery capabilities that significantly outperform traditional systems. Their built-in redundancy and automatic failover systems provide an additional layer of protection against data loss and downtime.
To ensure the utmost data integrity and availability, we have a rigorous protocol in place for the regular backup of all cloud storage and databases connected to the platform. Here are the detailed steps we follow:
  1. We schedule regular backups of all databases and storage connected to the platform. The frequency of these backups is carefully determined based on specific needs and the volume of data involved. This ensures that even the most recent data is protected.
  1. We ensure that backups are stored in a secure, offsite location. This is a critical measure that protects against data loss in the event of a disaster at the primary site. By keeping copies of data in geographically separate locations, we mitigate the risk of data loss due to localised events.
  1. We regularly test the recovery of the backups. This is a crucial step in verifying that the data can be restored accurately and promptly in the event of a disaster. By simulating disaster scenarios and practising the recovery process, we can ensure that our recovery times are optimised.
  1. We review and update the backup and recovery procedures regularly to accommodate changes in the data and the operational environment. As our data and systems evolve, so too do our backup and recovery procedures. This ensures that our disaster recovery plan remains effective and relevant.
By diligently adhering to these guidelines, we can guarantee the swift recovery of services in the event of a disaster and minimise the impact on our users. Our disaster recovery plan is not just about restoring services; it's about maintaining trust, reducing downtime, and preserving the integrity of our users' data.
 
To ensure the smooth functioning of the production system, it is crucial to backup and restore the four critical components - Datrics Application Services, cloud storage, Postgres Database, and Redis.
 
Datrics Application Services is a group of highly available, stateless services that run within a Kubernetes cluster. In the event of data loss, the recovery of these services is a fast procedure as it is stateless. The system administrator can either redeploy or restart services in the cluster or recreate the entire cluster using Terraform/CloudFormation scripts or Helm charts depending on the initial installation process.
 
Cloud storage, such as S3/GCS or Azure Block Storage, is one of the most widely used storage options. Cloud platforms provide their backup and restoration procedures for data stored in these services. The system administrator must ensure regular backups and train recovery procedures to minimize the risk of data loss.
 
For Postgres Database, we recommend using managed databases as part of the cloud infrastructure provided by AWS/GCP/Azure. They offer backup, recovery, and scalability methods, making data recovery a straightforward process.
 
We recommend using HA, replicated managed Redis by the cloud provider of choice, as the cloud provides methods to back up, recover and scale this service.
 
It is important to note that Datrics itself does not require backup as it stores all the data inside the services mentioned above. In the event of any issues, the Datrics team will provide support to recover all systems within the specified time frame mentioned in the SLA.
 
By carefully following these guidelines, the production system will be better equipped to handle any potential data loss and ensure seamless operations.