Elasticsearch distributes data across nodes in a cluster using shards. For optimal performance at scale, it‘s critical that shards are evenly balanced across nodes. This in-depth tutorial explains how Elasticsearch shard allocation and rebalancing works.

Shard Architecture Fundamentals

Understanding Elasticsearch‘s shard architecture is key to tuning rebalancing performance. When creating an index, Elasticsearch partitions data into multiple logical shards based on a configured number of primary shards. For example:

PUT /logs
{
  "settings": {
    "number_of_shards": 3  
  }
}

This logs index is split into 3 primary shards, which allow parallelization of operations across shards.

By default, Elasticsearch also creates 1 replica shard per primary shard. Replicas provide redundancy but also allow load balancing of read operations across primaries and replicas.

Based on this default configuration, the logs index with 3 primary shards would result in:

  • 3 primary shards (P1, P2, P3)
  • 3 replica shards (R1, R2, R3)

So there are 6 total shards per index distributed across the cluster.

Shard Allocation Process

When nodes are added or removed from the cluster, Elasticsearch redistributes these shards amongst nodes to balance load. This automated process is handled by the shard allocation mechanism.

According to the Elasticsearch Architectural Guide, here are some key considerations made by the allocator:

  • Even distribution of shards across all nodes – For performance and resilience.
  • Awareness of hardware configurations – Balance memory intensive shards.
  • Replication of shard copies – Ensure high availability.
  • Avoiding node saturation – Prevent overload.

Based on these and other factors, the allocator assigns each shard to a node in the cluster.

Example Topology

For example, given a 3 node cluster with the logs index from before, the shard allocation may look as follows:

Node 1: 
    - Primary Shard 1
    - Replica Shard 3

Node 2:
   - Primary Shard 2
   - Replica Shard 1

Node 3:
   - Primary Shard 3
   - Replica Shard 2

This balanced distribution ensures replicas are allocated to different nodes than their corresponding primaries. Reads can scale across all 6 shards. If any single node fails, no data will be lost.

When is Rebalancing Needed?

As the cluster topology changes, the shard allocation can become imbalanced across nodes. Specific events that trigger an automatic rebalance include:

  • Nodes joining or leaving the cluster
  • Indices created or deleted
  • Nodes stopping or starting
  • Hardware changes to existing nodes

To illustrate, let‘s add 2 additional nodes to our example cluster:

Node 1: 
    - Primary Shard 1
    - Replica Shard 3

Node 2:
   - Primary Shard 2
   - Replica Shard 1

Node 3:
   - Primary Shard 3
   - Replica Shard 2

Node 4 (NEW)   

Node 5 (NEW)

Now shard allocation is imbalanced – Nodes 1-3 each have 2 shards, while Nodes 4-5 have none. Performance begins degrading.

By rebalancing shards across all 5 nodes, load can be balanced out:

Node 1:
    - Primary Shard 1

Node 2:    
   - Replica Shard 1

Node 3:
   - Primary Shard 2

Node 4:   
   - Replica Shard 2

Node 5:
   - Primary Shard 3
   - Replica Shard 3

Now the cluster is balanced again. This is automatically handled by Elasticsearch as nodes are added or removed.

Configuring Automatic Rebalancing

Elasticsearch provides configuration knobs to control the automatic rebalancing process. As we saw previously, this can be configured by updating cluster settings:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.rebalance.enable": "all",
    "cluster.routing.allocation.allow_rebalance": "always",
    "cluster.routing.allocation.cluster_concurrent_rebalance": 5  
  }
}

Some key settings include:

cluster.routing.rebalance.enable

Enables rebalancing for different shard types:

  • all – Rebalance both primaries and replicas (default)
  • primaries – Rebalance only primaries
  • replicas– Rebalance only replicas

cluster.routing.allocation.allow_rebalance

Controls when rebalancing is allowed

  • always – Allow at any time (default)
  • indices_primaries_active – Allow only when all primaries active
  • indices_all_active – Allow only when all shards active

cluster.routing.allocation.cluster_concurrent_rebalance

Maximum number of concurrent shard rebalances, default is 2.

Based on the use case, these settings provide granular control over the automated rebalancer. For example, you may only want primary rebalancing to occur during maintenance windows.

When is Manual Rebalancing Useful?

While automated rebalancing is preferred, there are some cases where manual rebalancing is helpful:

Node Draining – When decommissioning hardware, you can manually rebalance shards off the node before shutting it down gracefully.

Spotting Imbalances – If a node seems "hot" you can investigate and move shards accordingly.

Data Locality – Place certain indices/shards close to dependent services, if possible.

Manual rebalancing can be performed using the reroute API:

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "my-index",
        "shard": 0, 
        "from_node": "node1", 
        "to_node": "node2"   
      }
    }
  ]
} 

However, the cluster allocation mechanism may override manual changes to restore automatic balancing.

Monitoring & Optimizing Rebalancing

Carefully monitor shard allocation during topology changes or maintenance events. Look for:

  • Uneven shard distribution across nodes
  • Nodes exceeding capacity
  • Changes in cluster health
  • Query latency spikes

Tools like Kibana Reporting can track key metrics like document count per shard and nodes joining/leaving over time.

Tuning allocation awareness and capacity allocation settings can also optimize placement of heavy and light shards. Refer to Elastic‘s Capacity Planning Guide for best practices.

Conclusion

Elasticsearch automatically rebalances shards to provide evenly distributed load across nodes. This prevents hot spots and improves performance.

Carefully evaluate cluster events that trigger rebalancing. Monitor health metrics during topology changes. Tune advanced configuration settings for optimal control of the automated rebalancer.

While automated rebalancing is preferred, manual rebalancing serves some specialized use cases. Master Elasticsearch shard allocation and rebalancing to scale clusters efficiently.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *