环境说明
这里采用的是虚拟机方式部署的 Elasticsearch 集群总共 5个节点,3个,Elasticsearch 版本是 7.17.23,配置如下
主机名 | IP 地址 | 系统版本 | Elasticsearch 版本 |
---|---|---|---|
elasticsearch01 | 172.31.100.141 | Ubuntu server 24.04 LTS | 7.17.23 |
elasticsearch02 | 172.31.100.142 | Ubuntu server 24.04 LTS | 7.17.23 |
elasticsearch03 | 172.31.100.143 | Ubuntu server 24.04 LTS | 7.17.23 |
elasticsearch04 | 172.31.100.144 | Ubuntu server 24.04 LTS | 7.17.23 |
elasticsearch05 | 172.31.100.145 | Ubuntu server 24.04 LTS | 7.17.23 |
这次需要将 Elasticsearch 集群缩容,将节点 elasticsearch04 以及 elasticsearch05 从集群中移除。
Elasticsearch 集群缩容
修改 Elasticsearch 配置文件
备份 elasticsearch.yml 文件
1
cp /usr/local/elasticsearch/config/elasticsearch.yml /usr/local/elasticsearch/config/elasticsearch.yml.bak
修改 elasticsearch.yaml 配置文件,将 “172.31.100.144” 以及 “172.31.100.145” 从
discovery.seed_hosts
配置中移除陆续重启节点
迁移节点数据
注意:排空前先检查是否存在 close 索引
在实际生产操作中,close 状态的索引是无法被迁移,所以在我们缩容完毕之后,这一类索引数据出现了丢失,无法恢复。建议大家缩容前都检查一下这一类索引,所有打开或者删除的需要在排空数据之前操作,不然会在缩容中出现数据丢失。
在 kibana 上依次排空 “172.31.100.144” 以及 “172.31.100.145” 节点上的数据
1
2
3
4
5
6
7
8
9
10#排空数据
PUT /_cluster/settings
{
"persistent" :{
"cluster.routing.allocation.exclude._ip" : "172.31.100.145"
},
"transient" :{
"cluster.routing.allocation.exclude._ip" : "172.31.100.145"
}
}检查索引数据迁移状态(RELOCATING表示正在迁移),如果没有状态为
RELOCATING
的话,表示数据已排空1
GET /_cat/shards?v&pretty&s=state:desc
数据迁移完后重启 Elasticsearch 服务
1
systemctl stop elasticsearch
如果觉得迁移速度太慢,可以使用以下方法加快排空数据的速度
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32//每秒传输速度,默认40Mb
PUT /_cluster/settings
{
"persistent" :{
"indices.recovery.max_bytes_per_sec" : "200mb"
},
"transient" :{
"indices.recovery.max_bytes_per_sec" : "200mb"
}
}
// 调整恢复线程数,默认是2
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": "5"
},
"transient": {
"cluster.routing.allocation.node_concurrent_recoveries": "5"
}
}
// 调整恢复线程数,默认是4
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_initial_primaries_recoveries": "5"
},
"transient": {
"cluster.routing.allocation.node_initial_primaries_recoveries": "5"
}
}查看集群状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18GET /_cluster/health?pretty
{
"cluster_name" : "es-cluster-logs",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 90,
"active_shards" : 180,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
恢复集群排空配置
将排空操作恢复原来的配置
1
2
3
4
5
6
7
8
9PUT /_cluster/settings
{
"persistent" :{
"cluster.routing.allocation.exclude._ip" : ""
},
"transient" :{
"cluster.routing.allocation.exclude._ip" : ""
}
}将集群设置回复为原来的配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32//每秒传输速度,默认40Mb
PUT /_cluster/settings
{
"persistent" :{
"indices.recovery.max_bytes_per_sec" : "40mb"
},
"transient" :{
"indices.recovery.max_bytes_per_sec" : "40mb"
}
}
// 调整恢复线程数,默认是2
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": "2"
},
"transient": {
"cluster.routing.allocation.node_concurrent_recoveries": "2"
}
}
// 调整恢复线程数,默认是4
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_initial_primaries_recoveries": "4"
},
"transient": {
"cluster.routing.allocation.node_initial_primaries_recoveries": "4"
}
}验证缩容后节点服务是否正确
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33# 查看集群状态
GET /_cluster/health?pretty
{
"cluster_name" : "es-cluster-logs",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 90,
"active_shards" : 180,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
# 查看节点信息
GET /_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.31.100.143 41 98 3 0.27 0.34 0.36 cdfhilmrstw - elasticsearch03
172.31.100.141 12 98 1 0.05 0.10 0.12 cdfhilmrstw * elasticsearch01
172.31.100.142 47 98 0 0.14 0.19 0.18 cdfhilmrstw - elasticsearch02
# 查看磁盘使用
GET /_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
60 21.2gb 47.3gb 443.7gb 491gb 9 172.31.100.143 172.31.100.143 elasticsearch03
60 18.6gb 44.2gb 446.7gb 491gb 9 172.31.100.141 172.31.100.141 elasticsearch01
60 14gb 40gb 451gb 491gb 8 172.31.100.142 172.31.100.142 elasticsearch02关闭下线的 Elasticsearch 节点
特别注意
重启 active master 节点时,一定要通知业务在重启。(active master 重启,集群需要重新选举 active,大概过程是1分钟左右,选举的过程中集群不可用。对集群特别敏感的业务要特别注意。)