SearchStax Named to the 2024 Deloitte Technology Fast 500 | LEARN MORE
February 08, 2023
Karan Jeet Singh
|
Following the top Solr best practices is critical when you are moving your Apache Solr infrastructure from development to production, whether you’re using Sitecore, AEM, Drupal, or your own custom application. It is quite straightforward to install Solr on your local machine and load a collection; however, when you start getting ready for production, you discover how daunting it can actually be.
Moving your Apache Solr infrastructure from development to production is a critical step, whether you’re using Sitecore, AEM, Drupal, or your own custom application. It is quite straightforward to install Solr on your local machine and load a collection; however, when you start getting ready for production, you discover how daunting it can actually be.
Here are our top 10 Solr best practices for getting your system ready for production that Solr doesn’t print on their “instruction label.”
Zookeeper is a configuration-management application that comes prepackaged with Solr. It works well in the local environment, but as you increase the traffic or number of nodes, the prepackaged Zookeeper falls short of delivering proper support to your Solr cluster.
When taking your Solr to production:
Doing all this will ensure that all the internal workings of Solr can keep on going smoothly even under the heaviest of traffic.
You can learn more about Zookeeper applications here:
You do not want your production system to go down, ever, which is why it is so important that your Solr setup supports high availability. It can be done by:
This will ensure that even if one of the nodes goes down, you will still have access to all of your data.
View these resources to learn more about safeguarding your production system’s integrity:
When setting up your production Solr, it is very important to follow all the attainable security protocols. The first one is TLS, or Transport Layer Security. It provides you an end-to-end encryption between Solr and your application. You can either:
Also, you should enable disk encryption on your data disk so that all the data stored by Solr is encrypted.
To learn more about security protocols, you can view these helpful articles:
After encrypting your connection to Solr, it is crucial to restrict any unwanted access to it. Solr lets you enable basic authentication for your system, and it lets you add multiple users to it. Users can have different levels of access to Solr.
Another great way to secure your Solr is by applying IP filtering to your instance. It can help you restrict access to all ports except the ones being used by Solr and zookeeper. It can also help you control the incoming traffic by allowing access from only specified IP/CIDR ranges. So:
To learn more about security access, read these helpful resources:
A load balancer distributes incoming queries across Solr nodes. Putting a load balancer on top of your Solr cluster helps you achieve a truly distributed and highly-available architecture.
Some clients rely on a combination of SolrCloud and Zookeeper to access Solr in a distributed sense, where the client library gets IP addresses of Solr nodes from zookeeper and then queries them randomly. This method is not recommended, because it is not universally supported and it relies on accessing the Solr nodes directly. A load balancer, on the other hand, adds a layer of security on top of your Solr cluster, and also is a universally-supported way of accessing your Solr cluster in a distributed fashion.
Here are some load balancers that you can deploy:
To learn more about load balancing, read these helpful resources:
Logging keeps you informed about conditions inside Solr. Solr provides a very granular control over logging levels for each and every component. It is very important that you:
For more about configuring logging, check out the Apache Solr Reference Guide to Configuring Logging.
It is very important to monitor all aspects of your Solr deployment regularly and consistently. Having a clear view of the system-health metrics, such as Memory, System Load Average, JVM Heap, helps you avoid CPU overload and out-of-memory errors. The collection-level metrics, such as indexing performance, search latency, and errors occurring during searching and indexing, are beneficial in tuning the dynamic behavior of your system.
You can do so by:
Solution providers, such as SearchStax, enable these features out of the box and provide their Monitoring documentation for others to see what they can expect. These features are essential for a smooth-running deployment. SearchStax has also created a blog post with the four critical metrics to monitor Solr health.
It is crucial to protect your Solr data. Make regular backups of your data so you can restore your system quickly and efficiently in case of a failure.
It is important that you:
Also, backing up a multi-node system can be a complex task. You have to mount a shared drive across all the nodes and then trigger backups using the BACKUP API.
If you’d like to know more, read these helpful articles.
It is difficult to predict how large your Solr collection will be when you are first starting out. It is always advisable to change the data directory of your Solr deployment to a dedicated storage device that can be expanded as your data grows. All the major cloud providers allow you to attach an extra storage device to your instance. It can be increased in size later if need be.
A typical Solr deployment lives on one or more servers that are accessed via IP address/URLs and ports to those systems. Network settings can sometimes create vulnerabilities in your architecture. It is important to keep an eye on what can be accessed via the internet, and what potential exploits you might be inviting. For instance, just changing the header size in jetty.xml might lead to a potential DDoS exploit.
When you deploy Solr in production, make sure that you only update the settings that you absolutely must change. It’s a good practice to do a regular vulnerability scan on your servers to see if there are any specific vulnerabilities that need to be addressed. In addition, it’s best to keep yourself updated with the latest discovered vulnerabilities and monitor the Apache Solr mailing lists. Once you’ve discovered a vulnerability for the version of Solr you are using, it’s best to apply the patch in a timely manner.
It’s one thing to know what the best practices are. It’s another thing to actually put them into use. If you haven’t tried implementing these as part of your day-to-day Solr infrastructure operations, then you should expect to go through many iterations before you get it right. However you can shorten the learning curve by working with a partner, such as SearchStax, who already knows how to navigate the challenges you might face. That way you get a solution that fits your needs, letting you focus on improving other parts of your operation.
Let us solve the technical aspects of your Solr infrastructure. You don’t have to go it alone trying to implement these best practices. Have a conversation with one of our Solr experts to discover how we can make that burden easier to manage.
SearchStax Managed Search is a fully-managed SaaS solution that automates, manages, maintains and scales Solr search infrastructure in public or private clouds. We take care of Solr and make sure you have a reliable, secure and compliant Solr setup so you can focus on more value-added tasks.
Schedule a demo or start a free trial today.
The Stack is delivered bi-monthly with industry trends, insights, products and more
Copyrights © SearchStax Inc.2014-2024. All Rights Reserved.
SearchStax Site Search solution is engineered to give marketers the agility they need to optimize site search outcomes. Get full visibility into search analytics and make real-time changes with one click.
close
SearchStax Managed Search service automates, manages and scales hosted Solr infrastructure in public or private clouds. Free up developers for value-added tasks and reduce costs with fewer incidents.
close
close