Disaster Recovery Options for Solr Deployments

Mar. 31, 2020

Tom Humbarger

|

5 min. read

Why Do We Need Disaster Recovery?

Machines and technology fail. Natural or man-made disasters occur. Networks go down. And website or application outages caused by any of these situations are going to happen. On the other hand, today’s customers have high expectations that any website they access will be available at all times and they have minimal tolerance for any inconveniences due to system outages.

Instead of hoping for good luck or that disasters will bypass them, most companies meet their business requirement to minimize customer disruptions by deploying disaster recovery or DR options to deliver high levels of uninterrupted uptime. The disaster recovery options for SearchStax Cloud, our Solr-as-a-Service solution, are an insurance policy against the unexpected and provide the highest level of customer service while reducing risk and removing uncertainty while determining the amount of downtime your business will sustain.

Why are RTO and RPO Important in Disaster Recovery?

As noted in an earlier SearchStax blog post The Important Rs for Your Solr Disaster Recovery Plan, there are two key terms with respect to disaster recovery:

  • Recovery Time Objective or RTO is the amount of time that your Solr service can be unavailable in case of an emergency
  • Recovery Point Objective or RPO defines the amount of data that your business can tolerate losing in case of an emergency.

RPO determines how frequently you need to back up your data and/or synchronize it across your infrastructure. Assuming your RPO is two hours, you would need to make sure you have a backup within the last two hours available for your business.

RTO determines how quickly you can recreate your infrastructure and recover the data from your backup.

Disaster Recovery Options for Solr Deployments

SearchStax offers three disaster recovery options for Solr deployments to meet a range of business requirements:

  • Hot Disaster Recovery
  • Warm Disaster Recovery
  • Cold Disaster Recovery

The differences are based on how quickly it takes to restore your service and how much data loss your business is willing to tolerate in the event of a disaster.

Under all Disaster Recovery options, SearchStax takes on the responsibility of restoring your service to full operation following a disaster event. Disaster recovery efforts will be initiated upon customer request or when the managed Solr service is unavailable for five minutes.

Hot Disaster Recovery for Solr Deployments

For businesses with aggressive RTO and RPO requirements, a duplicate infrastructure that mirrors your primary production environment in real-time may be needed. SearchStax offers a Hot Disaster Recovery option that augments your production deployment and provides the highest level of redundancy and resource capacity during a disaster.

To achieve this level of redundancy, we create a “hot” standby or secondary deployment in a different region than your production deployment. This secondary deployment is a full replica of your primary production environment and is kept in sync with your production system at all times. That means that an interruption in service at the primary site for the production system can be met by a near instantaneous switchover to the standby site.

To ensure that no disaster can knock out the primary and backup systems, we architect the system to have the standby system run in a different cloud-provider region.

The Hot Disaster Recovery option has a service level agreement (SLA) to restore your site within 10 minutes with full functionality, but the real recovery will likely be less than 10 seconds after a failure of the primary environment.

Warm Disaster Recovery for Solr Deployments

The Warm Disaster Recovery is similar to the Hot Disaster Recovery option except the secondary deployment is not as redundant or resilient as the production system. For the Warm standby site, we create a scaled-down or skinny version of the main deployment in the secondary region. While this option provides similar benefits as the Hot option, it does so at a lower cost.

The difference is that the standby site will not have the same redundancies and capacity as the production site and new nodes will need to be added to replicate the capabilities of the production site on-demand. While the backup system will be available almost immediately, the new clusters need to be added to restore full redundancy.

Solr Warm Disaster Recovery

The Warm Disaster Recovery option has a service level agreement (SLA) to restore your site within 10 minutes and will provide full replication functionality in under 4 hours.

Cold Disaster Recovery for Solr Deployments

For some businesses, downtime is not a critical requirement. Under the Cold Disaster Recovery option, the restoration process starts after the disaster occurs. A new Solr cluster is created and the data and configurations are then restored from a backup file.

The SearchStax Cold Disaster Recovery handles this process for you and uses backup files that are stored in different cloud regions and provides a higher level of recovery than backups stored in the same region. While everyone should maintain regular backups of their deployments, the Cold Disaster Recovery option takes this to the next level by storing the backups in a different cloud region from the primary system. If the production site goes down, we will start a deployment in the same region as the backup and restore it.

Solr Cold Disaster Recovery

What are the Details Behind Each of the Disaster Recovery Options?

The details for the disaster recovery options offered through SearchStax are summarized in the table below.

 Hot Disaster
Recovery
Warm Disaster RecoveryCold Disaster Recovery
Secondary EnvironmentActive full-size Solr deployment in another cloud regionScaled-down Solr deployment in another cloud regionCreated on-demand in another cloud region or same region upon request
DR TriggerUpon customer request or when Solr service is unavailable for 5 minutesUpon customer request or when Solr service is unavailable for 5 minutesCreated on-demand
upon customer request or when Solr service is unavailable for 5 minutes
DR Recovery ProcessDR triggers a DNS failoverDR triggers a DNS failoverDR triggers new cluster deployment,  backup restoration and DNS failover
RPO Service Level (SLA)24 hours to as low as 10 minutes24 hours to as low as 10 minutes24 hours to as low as 3 hours
RTO Service Level (SLA)Less than 10 minutesLess than 10 minutes8 hours
(typically 2 to 4 hours)

Disaster Recovery with CDCR

SearchStax also a Solr Disaster recovery option using Cross Data Center Replication or CDCR which can provide nearly instantaneous synchronization of your data and reduce RPO to minutes. See the Disaster Recovery with CDCR Provides Near Real-Time RPO blog post with additional details with the benefits, use cases and limitations. [Note that CDCR is only a stable feature in Solr 7, is not recommended in Solr 8 and is not a feature in Solr 9.]

Now that you have a clear explanation of the SearchStax options for Disaster Recovery for Solr deployments, you will be able to decide for yourself which option best fits your business case and needs.

By Tom Humbarger

Senior Marketing Programs Manager

"...Our 10.4 connector update delivers a seamless upgrade experience for Managed Search users..."

You might also like: