Sitecore-Best-Practices-for-Solr-Indexing-1536x864

February 07, 2023

Pete Navarra

|

5 min. read

Sitecore is one of the leading content management and marketing platforms in the market today. Sitecore uses a search engine for two purposes in a standard implementation – Content Search and xConnect Search. Since Sitecore version 7.0, Apache Solr has been the recommended default indexing engine.

Running and operating self-managed Solr environments is no easy task. With SearchStax Cloud, we take the hassle and frustration of running Solr operations out of your hands with a flexible and easy-to-deploy, easy-to-use service.

For those organizations who prefer to run self-managed Solr environments, there are many best practices to keep operations running as smoothly as possible. We highlight five guiding principles for optimizing Solr within a Sitecore deployment, developed first-hand by SearchStax’s dedicated Solr experts and engineers.

For specific guidance on implementing Sitecore, reach out to an implementation partner, your development team or Sitecore Support for detailed assistance. We are providing these suggestions as general guidelines for Sitecore developers who are using or planning to use Solr for indexing.

1. Avoid Using the Synchronous Index Strategy

The Synchronous Index Strategy is purpose-built to index 1 Sitecore Item at a time. This can often lead to lengthy and time-consuming publishing operations, especially when publishing the full site.

Instead, switching to the Interval Asynchronous index update strategy will result in better performance and allow you to optimize resource consumption.

Sitecore has published a Knowledge Base article on this subject that applies to all versions of Sitecore between Version 7.0 and 9.3. This issue has been fixed and is not applicable in Sitecore versions 10 or higher.

2. Increase IntervalAsynchronous Interval

In conjunction with the previous guidance, there is an interval setting on the IntervalAsynchronous index strategy with a default value of 5 seconds. Increasing this interval will mean that it will take longer for new content published to be visible on the Content Delivery servers, but it will also allow for better optimization of the Solr environment by batching index updates. 

While there is no specific guidance provided by Sitecore for this situation, we recommend setting this value to 2 Minutes. Depending on your content publishing operation needs, you might need to adjust this interval to fit your users’ needs.

				
					<intervalAsyncMaster type="Sitecore.ContentSearch.Maintenance.Strategies.IntervalAsynchronousStrategy, 
Sitecore.ContentSearch" ...>
  <param desc="database">master</param>
  <param desc="interval">00:02:00</param>
  ...
</intervalAsyncMaster>
				
			

3. Control the frequency of publish operations

Many Solr performance issues are caused by frequent and constant publishing operations. As  an example, if you are publishing every 1 second (or 60 items a minute), Sitecore is causing a Solr commit every second which is an expensive resource drain on your Solr environment.

This recommendation may mean a change in content authoring and publishing behaviors which could be met with some resistance in your organization. 

Sitecore also provides numerous ways to control the publishing process via the Sitecore Content Publishing Service.  The Content Publishing Service can drastically decrease your authors’ workflow needs while optimizing the publishing experience on both indexes and databases.

4. Increase the value of the ContentSearch.IndexUpdate.BatchSize

A consistent theme in the first three recommendations have to do with how Sitecore updates Solr indexes in “batches”. Batching is a recommended practice. Depending on the type of content being published, the right batch size can lead to significant performance improvements.

Specifically, we are talking about these two settings related to ContentSearch:

<setting name="ContentSearch.ParallelIndexing.Enabled" value="true" />
<setting name="ContentSearch.IndexUpdate.BatchSize" value="300" />

The default settings indicate that we are going to put 300 items into a batch before the index gets updated in a multi-thread approach. The 300 items might not seem like an easy number to hit until you take into consideration that there are a number of “related items”, “subitems” which will easily multiple when you consider multi-language items. The number of items being published can quickly skyrocket with negative consequences on Solr performance.

Changing this ContentSearch value should be done carefully, and this suggestion should focus on validating your BatchSize.

For content where the amount of information for an item is relatively low (for example, a low number of  fields, or just a few rich text fields), increasing the batch size can result in better performance.

For content needs where the amount of information is very high, or an item’s index document is very large, increasing this value can have negative impacts to indexing performance.

5. Reduce ContentSearch.SearchMaxResults

In addition to optimizations for indexing Sitecore content to Solr, there are also settings that can be optimized for improving Search querying performance. One of these settings is called the ContentSearch.SearchMaxResults.

<setting name="ContentSearch.SearchMaxResults" value="1000000" />

The default value is 1,000,000 results for every query. When a request is sent to Solr, this instruction tells Solr to send it1,000,000 results, or ROWS, as is shown in the query. This requires Solr to allocate memory to store 1,000,000 results in memory. This becomes an issue when the setting is applied to every single Content Search API request from Sitecore to Solr which can lead to severe memory overallocation and performance degradation on the Solr cluster.

Reducing this value to a lower number, such as 10,000, will drastically improve performance of your Solr cluster and thereby improve performance of your search results.

One drawback is that the search results are limited to 10,000 results which could impact the search experience. Sitecore has published a Knowledge Base article with more information on this setting which applies to all versions of Sitecore 9.0 and higher.

Learn more about best practices for Solr performance

In addition to optimizations for indexing Sitecore content to Solr, there are also settings that can be optimized for improving Search querying performance. One of these settings is called the ContentSearch.SearchMaxResults.

<setting name="ContentSearch.SearchMaxResults" value="1000000" />

The default value is 1,000,000 results for every query. When a request is sent to Solr, this instruction tells Solr to send it1,000,000 results, or ROWS, as is shown in the query. This requires Solr to allocate memory to store 1,000,000 results in memory. This becomes an issue when the setting is applied to every single Content Search API request from Sitecore to Solr which can lead to severe memory overallocation and performance degradation on the Solr cluster.

Reducing this value to a lower number, such as 10,000, will drastically improve performance of your Solr cluster and thereby improve performance of your search results.

One drawback is that the search results are limited to 10,000 results which could impact the search experience. Sitecore has published a Knowledge Base article with more information on this setting which applies to all versions of Sitecore 9.0 and higher.

 

Talk to an Expert or Get A Demo of SearchStax Cloud

Let us solve the technical aspects of your Solr for Sitecore infrastructure. With SearchStax Cloud, we focus on making sure you have a reliable, secure and compliant Solr setup. Create backups, set up disaster recovery and monitor Solr performance health with alerts, so you can focus on bigger initiatives. Talk to one of our experts or schedule a demo to learn more.

By Pete Navarra

VP, DXP Solutions

“…search should not only be for those organizations with massive search budgets...”

You might also like: