Note: We have a glossary covering collections, cores, shards, and replicas.
As you might imagine, every SearchStax Managed Search service project is different. The size of the deployment (number of nodes, RAM and disk space) depends on the number of documents, the size of the stored fields, the number of collections, indexing frequency, query loading, and other factors that vary from one project to the next. The SearchStax Adoptions team helps premium clients estimate these needs during the on-boarding process.
Deployment size is not cast in stone. Once the team has acquired some experience indexing and searching real data, it is easy to upgrade a deployment for more memory/disk or for additional Solr nodes (servers).
Specific best practices for nodes, shards, and replicas are presented below.
Best Practice: Use at least two nodes!
A single-node system cannot provide high-availability/fault-tolerant behavior. Production systems should have at least two nodes.
Best Practice: Use one shard!
Sharding splits up a huge index across multiple servers. It introduces a great deal of complexity to the system. Sharding multiplies the number of servers required to achieve high-availability/fault-tolerant behavior. Shards disable Managed Search’s backup features. (Custom backups can be arranged for premium customers.)
If your index can fit comfortably on one server, then use one shard. This is Solr’s default behavior.
Best Practice: One replica per node!
To achieve high-availability/fault-tolerant behavior, every node of the cluster must have a replica of every collection. If some nodes are missing some replicas, there will be difficulties with backups and with performance monitoring of collections. A problem with a single node may take a collection out of service.
When you create the collection, set replicationFactor equal to the number of nodes in the cluster. Solr will automatically distribute the replicas to all nodes.
Questions?
Do not hesitate to contact the SearchStax Support Desk.