10 Best Practices for Taking Solr to Production

February 08, 2023

Karan Jeet Singh

|

7 min. read

Following the top Solr best practices is critical when you are moving your Apache Solr infrastructure from development to production, whether you’re using Sitecore, AEM, Drupal, or your own custom application. It is quite straightforward to install Solr on your local machine and load a collection; however, when you start getting ready for production, you discover how daunting it can actually be. 

Top 10 Solr Best Practices

Moving your Apache Solr infrastructure from development to production is a critical step, whether you’re using Sitecore, AEM, Drupal, or your own custom application. It is quite straightforward to install Solr on your local machine and load a collection; however, when you start getting ready for production, you discover how daunting it can actually be. 

Here are our top 10 Solr best practices for getting your system ready for production that Solr doesn’t print on their “instruction label.”

  1. Implement Zookeeper 
  2. Achieve High Availability
  3. Follow Security Protocols with TLS
  4. Maintain Access Restrictions
  5. Distribute Queries with a Load Balancer
  6. Determine What and When to Log
  7. Monitor the Health of your Solr Deployment 
  8. Create a Backup Strategy
  9. Determine When to Expand Storage
  10. Keep Up with Vulnerability Patches

1. Don’t Neglect Your Zookeeper

Zookeeper is a configuration-management application that comes prepackaged with Solr. It works well in the local environment, but as you increase the traffic or number of nodes, the prepackaged Zookeeper falls short of delivering proper support to your Solr cluster.

When taking your Solr to production:

  • Zookeeper should be installed separately.
  • Change the Solr configurations so that it uses this separate Zookeeper.
  • You should have one Zookeeper per Solr node, unless you have two Solr nodes. In that case you should have three Zookeeper nodes.

Doing all this will ensure that all the internal workings of Solr can keep on going smoothly even under the heaviest of traffic.

You can learn more about Zookeeper applications here:

2. Build Safeguards Against Downtime

You do not want your production system to go down, ever, which is why it is so important that your Solr setup supports high availability. It can be done by:

  • Deploying Solr across multiple nodes
  • Putting a replica of each collection on at least 2 nodes.

This will ensure that even if one of the nodes goes down, you will still have access to all of your data.

View these resources to learn more about safeguarding your production system’s integrity:

3. Follow TLS Security Protocols

When setting up your production Solr, it is very important to follow all the attainable security protocols. The first one is TLS, or Transport Layer Security. It provides you an end-to-end encryption between Solr and your application. You can either:

  • Update your Solr configurations and enable TLS, or
  • Add a layer of load balancer and enable TLS there.

Also, you should enable disk encryption on your data disk so that all the data stored by Solr is encrypted.

To learn more about security protocols, you can view these helpful articles:

4. Maintain Security Access Restrictions​

After encrypting your connection to Solr, it is crucial to restrict any unwanted access to it. Solr lets you enable basic authentication for your system, and it lets you add multiple users to it. Users can have different levels of access to Solr.

Another great way to secure your Solr is by applying IP filtering to your instance. It can help you restrict access to all ports except the ones being used by Solr and zookeeper. It can also help you control the incoming traffic by allowing access from only specified IP/CIDR ranges. So:

  • Enable basic authentication for Solr.
  • Add IP/CIDR filtering in your security group.

To learn more about security access, read these helpful resources:

5. Distribute Queries With A Load Balancer

A load balancer distributes incoming queries across Solr nodes. Putting a load balancer on top of your Solr cluster helps you achieve a truly distributed and highly-available architecture.

Some clients rely on a combination of SolrCloud and Zookeeper to access Solr in a distributed sense, where the client library gets IP addresses of Solr nodes from zookeeper and then queries them randomly. This method is not recommended, because it is not universally supported and it relies on accessing the Solr nodes directly. A load balancer, on the other hand, adds a layer of security on top of your Solr cluster, and also is a universally-supported way of accessing your Solr cluster in a distributed fashion.

Here are some load balancers that you can deploy:

  • Load balancer provided by your cloud provider
  • Nginx

To learn more about load balancing, read these helpful resources:

6. Determine What To Log

Logging keeps you informed about conditions inside Solr. Solr provides a very granular control over logging levels for each and every component. It is very important that you:

  • Go through this complete list to figure out which information is useful to you.
  • Set the level of logging that you want for each component.
  • Make sure that the log rotation policies are set in accordance with your needs:
    • Rotate logs based on time
    • Rotate logs based on size

 

For more about configuring logging, check out the Apache Solr Reference Guide to Configuring Logging

7. Consistently Monitor Your Solr Deployments

It is very important to monitor all aspects of your Solr deployment regularly and consistently. Having a clear view of the system-health metrics, such as Memory, System Load Average, JVM Heap, helps you avoid CPU overload and out-of-memory errors. The collection-level metrics, such as indexing performance, search latency, and errors occurring during searching and indexing, are beneficial in tuning the dynamic behavior of your system.

You can do so by:

  • Using metrics provided by your cloud provider
  • Having a set of APIs that can read and report system statistics. 

 

Solution providers, such as SearchStax, enable these features out of the box and provide their Monitoring documentation for others to see what they can expect. These features are essential for a smooth-running deployment. SearchStax has also created a blog post with the four critical metrics to monitor Solr health.

8. Back Up Solr Data Regularly

It is crucial to protect your Solr data. Make regular backups of your data so you can restore your system quickly and efficiently in case of a failure.

It is important that you:

  • Schedule a backup to happen at least once per day.
  • Store backups in a storage account outside of your deployment.
  • Set a retention policy to clean up out-of-date backups.
  • Do periodic backup verifications.

Also, backing up a multi-node system can be a complex task. You have to mount a shared drive across all the nodes and then trigger backups using the BACKUP API.

If you’d like to know more, read these helpful articles.

9. Expand Your Storage As You Grow

It is difficult to predict how large your Solr collection will be when you are first starting out. It is always advisable to change the data directory of your Solr deployment to a dedicated storage device that can be expanded as your data grows. All the major cloud providers allow you to attach an extra storage device to your instance. It can be increased in size later if need be.

  • Add a separate data disk to your instance.
  • Change the data directory of Solr from default to the added disk.
  • Change the size of the disk as required.

10. Patch Your Vulnerabilities

A typical Solr deployment lives on one or more servers that are accessed via IP address/URLs and ports to those systems. Network settings can sometimes create vulnerabilities in your architecture. It is important to keep an eye on what can be accessed via the internet, and what potential exploits you might be inviting. For instance, just changing the header size in jetty.xml might lead to a potential DDoS exploit.

When you deploy Solr in production, make sure that you only update the settings that you absolutely must change. It’s a good practice to do a regular vulnerability scan on your servers to see if there are any specific vulnerabilities that need to be addressed. In addition, it’s best to keep yourself updated with the latest discovered vulnerabilities and monitor the Apache Solr mailing lists. Once you’ve discovered a vulnerability for the version of Solr you are using, it’s best to apply the patch in a timely manner.

Put the Top Solr Best Practices To Work

It’s one thing to know what the best practices are. It’s another thing to actually put them into use. If you haven’t tried implementing these as part of your day-to-day Solr infrastructure operations, then you should expect to go through many iterations before you get it right. However you can shorten the learning curve by working with a partner, such as SearchStax, who already knows how to navigate the challenges you might face. That way you get a solution that fits your needs, letting you focus on improving other parts of your operation.

 

Follow our Top Solr Best Practices…Or You Could Trust SearchStax to Manage Your Solr for You

Let us solve the technical aspects of your Solr infrastructure. You don’t have to go it alone trying to implement these best practices. Have a conversation with one of our Solr experts to discover how we can make that burden easier to manage. 

 

SearchStax Managed Search is a fully-managed SaaS solution that automates, manages, maintains and scales Solr search infrastructure in public or private clouds. We take care of Solr and make sure you have a reliable, secure and compliant Solr setup so you can focus on more value-added tasks.

Schedule a demo or start a free trial today.

By Karan Jeet Singh

Solutions Engineer

“…SearchStax has added Sitecore 10.4 support for our SearchStax Managed Search Connector for Sitecore ...”

You might also like: