20-second Timeout
In Solr 8.4, a new timeout setting was applied to inter-node replication. Replica updates can time out after only 20 seconds, leaving solr.log messages saying that task or request processing has “stalled,” like these:
2021-12-13 19:13:32.265 WARN (qtp496729294-22) [c:sitecore_master_index s:shard1 r:core_node4 x:sitecore_master_index_shard1_replica_n3] o.a.s.u.SolrCmdDistributor Unable to finish sending updates => java.io.IOException: Task queue processing has stalled for 20194 ms with 0 remaining elements to process.
solr.log:49852:2022-02-15 18:58:41.805 ERROR (qtp496729294-190188) [c:sitecore_master_index s:shard1 r:core_node4 x:sitecore_master_index_shard1_replica_n3] o.a.s.u.p.DistributedZkUpdateProcessor Setting up to try to start recovery on replica core_node2 with url http://10.22.9.165:8983/solr/sitecore_master_index_shard1_replica_n1/ by increasing leader term => java.io.IOException: Request processing has stalled for 20100ms with 100 remaining elements in the queue.Note the timeout values just above 20000ms. If you are using Solr 8.4 or later and experience these timeout messages, contact SearchStax Support. We can reset the timing window to a larger value for you.
120-second Timeout
We also see the “processing has stalled” message in the log files when a replica update has failed. In this case, the timeout delay is around 120000 ms.
org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: 5 Async exceptions during distributed update:
Request processing has stalled for 119053ms with 100 remaining elements in the queue. This happens when a Solr node is temporarily under high load, performing background tasks like replication or segment merges.
Temporarily reduce the indexing load to let the cluster catch up.
Questions?
Do not hesitate to contact the SearchStax Support Desk.