猫虫绅士分享站

高可靠方案简要

常见的高可靠方案有：

主从复制方案：基于二进制日志（Binlog），三个线程IO thread，dump thread，sql thread，以及三个模式异步复制，半同步复制，全同步复制实现。
组复制：基于若干个节点共同组成一个复制组来实现，解决异步同步和半同步复制可能导致的数据不一致问题。
InnoDB集群：基于MySQL Group Replication（组复制） + MySQL Shell + MySQL Router实现。
共享存储方案：多服务器共享同一存储（例如SAN，DRBD），故障时切换挂载点。

主从方案上一章我们已经讲了，这一章我们讲MGR和InnoDB集群。

InnoDB集群

该方案共有三个关键组件，其中同步核心是MGR（MySQL Group Replication），高可靠多节点容灾组件InnoDB ClusterSet，快速部署和管理组件InnoDB ReplicaSet。

MySQL Group Replication（组复制）

同步流程

主库收到事务提交前，生成行级变更的Write Set，同时Binlog Dump现成将Write Set封装为Binlog Event，写入本地的Binglog Cache，然后触发Group Replication接口，进入组通信阶段。
然后主库的XCOM线程组监听本地Binlog和WriteSet的事务，使用proposer_task线程将该消息封装为proposal广播并广播至其他所有节点。
各节点的XCOM线程组通过acceptor_learner_task线程验证主库发送的Proposal广播，然后对提案投票返回ACK消息。
主库的XCOM线程组通过executor_task监听Learned消息，并开始计算返回的ACK信息中是否超过半数节点同意该事务更新，只要超过半数同意则标记达成共识，并标记为全局有序并将事务放置至各节点的incoming队列，不超过半数同意则丢弃。
各节点的Applier线程从incoming队列中取出事务消息，同时Certifier现成对该事务信息进行WriteSet进行冲突检测，然后调用存储引擎接口，执行数据变更，最后触发Mysql Server层生成Binlog。
如果有新节点加入，新节点会启动Recovery线程，向种子节点发送加入集群请求，获取集群成员视图，种子节点返回集群视图后，新节点选择合适的Donor节点并通过标准异步复制通道，向新节点传输缺失的Binlog事务。

测试实验

环境依旧是在K8S集群中实验，这次我们创建四个节点，一个MASTER两个NODE节点，以及一个NODE节点备用。

首先照旧贴出每一个节点的ConfigMap YAML文件

MASTER

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql-master
  namespace: sql
spec:
  selector:
    matchLabels:
      app: mysql-master # has to match .spec.template.metadata.labels
  serviceName: "mysql"
  replicas: 1 # by default is 1
  template:
    metadata:
      labels:
        app: mysql-master # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mysql-master
        image: kubernetes-harbor-registry.bugcat.com:80/library/mysql:8.0
        ports:
        - containerPort: 3306
          name: mysql
        - name: mgr
          containerPort: 33061
        - name: mysqlx
          containerPort: 33060
        - name: shell
          containerPort: 33062
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        - name: conf
          mountPath: /etc/my.cnf
          subPath: my.cnf
        env:
        - name: MYSQL_DATABASE
          value: "test"
        - name: MYSQL_ROOT_PASSWORD
          value: "111111"
      volumes:
      - name: conf
        configMap:
          name: conf
          items:
            - key: my.cnf
              path: my.cnf
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "rook-cephfs"
      resources:
        requests:
          storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-master
  namespace: sql
spec:
  selector:
    app: mysql-master
  clusterIP: None
  ports:
  - name: mysql-master
    protocol: TCP
    port: 3306
  - name: mgr
    protocol: TCP
    port: 33061
  - name: mysqlx
    port: 33060
  - name: shell
    port: 33062
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: conf
  namespace: sql
data:
  my.cnf: |
    [mysqld]
    server_id = 100
    log-bin = /var/lib/mysql/2master-bin
    binlog_format = ROW
    bind-address = 0.0.0.0
    log_error = /var/lib/mysql/error.log
    relay_log = /var/lib/mysql/realylog
    gtid-mode = ON
    enforce-gtid-consistency = true

    transaction_write_set_extraction=XXHASH64
    loose-group_replication_group_name='bd5bb53d-0d98-4fca-9e95-fd0dc0083132'
    loose-group_replication_start_on_boot=off
    loose-group_replication_ip_whitelist='0.0.0.0/0'
    loose-group_replication_local_address='mysql-master:33061'
    loose-group_replication_group_seeds='mysql-master:33061,mysql-node-1:33061,mysql-node-2:33061'
    loose-group_replication_bootstrap_group=off
    loose-group_replication_single_primary_mode=on
    loose-group_replication_enforce_update_everywhere_checks=off
    loose-group_replication_recovery_get_public_key=on
    loose-group_replication_recovery_use_ssl=off
    loose-group_replication_ssl_mode='DISABLED'
    loose-group_replication_consistency='EVENTUAL'
    loose-group_replication_member_expel_timeout=5
    report_host='mysql-master'
    report_port=3306
    plugin_load_add ='group_replication.so'
    auto_increment_increment=1
    auto_increment_offset=1
    mysqlx_port=33060
    admin_port=33062

剩余节点我只贴出ConfigMap的yaml配置，其余节点只需要修改ConfigMap的server-id，loose-group_replication_local_address,report_host即可。

Node-1/2/3

kind: ConfigMap
apiVersion: v1
metadata:
  name: node-1-conf
  namespace: sql
data:
  my.cnf: |
    [mysqld]
    server_id = 101
    log_error = /var/lib/mysql/error.log
    log-bin = /var/lib/mysql/2masterbin
    binlog_format = ROW
    log_slave_updates = ON
    gtid_mode = ON
    enforce_gtid_consistency = ON

     transaction_write_set_extraction=XXHASH64
     loose-group_replication_group_name='bd5bb53d-0d98-4fca-9e95-fd0dc0083132'
     loose-group_replication_start_on_boot=off
     loose-group_replication_ip_whitelist='0.0.0.0/0'
     loose-group_replication_local_address='mysql-node-1:33061'
     loose-group_replication_group_seeds='mysql-master:33061,mysql-node-1:33061,mysql-node-2:33061'
     loose-group_replication_bootstrap_group=off
     loose-group_replication_single_primary_mode=on
     loose-group_replication_enforce_update_everywhere_checks=off
     loose-group_replication_recovery_get_public_key=on
     loose-group_replication_recovery_use_ssl=off
     loose-group_replication_ssl_mode='DISABLED'
     loose-group_replication_consistency='EVENTUAL'
     loose-group_replication_member_expel_timeout=5
     report_host='mysql-node-1'
     report_port=3306
     plugin_load_add ='group_replication.so'
     auto_increment_increment=1
     auto_increment_offset=1
     mysqlx_port=33060
     admin_port=33062

部署后，我们进入所有的节点，为所有节点添加一个连接复制账户。

mysql> create user 'repl'@'%' identified with mysql_native_password by '111111';
Query OK, 0 rows affected (0.10 sec)
mysql> grant replication slave,replication client, backup_admin on *.* to 'repl'@'%';
Query OK, 0 rows affected (0.06 sec)
mysql> flush privileges;

然后查看是否安装组复制插件，如果没安装，用第二条命令安装。

mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS  WHERE PLUGIN_NAME LIKE '%group_replication%';
+-------------------+---------------+
| PLUGIN_NAME       | PLUGIN_STATUS |
+-------------------+---------------+
| group_replication | ACTIVE        |
+-------------------+---------------+
mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';

然后为三个节点配置复制通道，剩下一个节点不做处理，拿来备用。

mysql> change master to master_user = 'repl',
    -> master_password = '111111' for channel 'group_replication_recovery';
mysql> SELECT User_name,Channel_name FROM mysql.slave_master_info WHERE user_name = 'repl';
+-----------+----------------------------+
| User_name | Channel_name               |
+-----------+----------------------------+
| repl      | group_replication_recovery |
+-----------+----------------------------+
mysql> SET global group_replication_recovery_get_public_key=ON;
Query OK, 0 rows affected (0.00 sec)

然后启动主复制，此时需要现在主节点进行单独配置。

mysql> set GLOBAL group_replication_bootstrap_group=ON;
mysql> start GROUP_REPLICATION;
mysql> set GLOBAL group_replication_bootstrap_group=OFF;
mysql> select * FROM performance_schema.replication_group_members\G
*************************** 1. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 7a9c9f1d-482d-11f0-97df-96f01138d258
               MEMBER_HOST: mysql-master
               MEMBER_PORT: 3306
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: PRIMARY
            MEMBER_VERSION: 8.0.42
MEMBER_COMMUNICATION_STACK: XCom

随后进入其他两个节点，加入该组。

mysql> START GROUP_REPLICATION;

mysql> select * FROM performance_schema.replication_group_members\G
*************************** 1. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 7a9c9f1d-482d-11f0-97df-96f01138d258
               MEMBER_HOST: mysql-master
               MEMBER_PORT: 3306
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: PRIMARY
            MEMBER_VERSION: 8.0.42
MEMBER_COMMUNICATION_STACK: XCom
*************************** 2. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 811f3a2e-482d-11f0-b9c2-b248fa7f81ce
               MEMBER_HOST: mysql-node-1
               MEMBER_PORT: 3306
              MEMBER_STATE: RECOVERING
               MEMBER_ROLE: SECONDARY
            MEMBER_VERSION: 8.0.42
MEMBER_COMMUNICATION_STACK: XCom
*************************** 3. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 8122dec0-482d-11f0-837b-beb046169ce8
               MEMBER_HOST: mysql-node-2
               MEMBER_PORT: 3306
              MEMBER_STATE: RECOVERING
               MEMBER_ROLE: SECONDARY
            MEMBER_VERSION: 8.0.42
MEMBER_COMMUNICATION_STACK: XCom
3 rows in set (0.00 sec)

MGR组复制就配置完成了，是不是感觉比mysql replication的配置更少更简洁，但是我们的目标是创建InnoDB集群，所以接下来就是安装InnoBD集群。

InnoDB集群

Mysql Shell

首先我们要在主节点或者说是控制节点安装mysql shell，方便我们调用mysql的API，提升我们安装集群的效率和降低运维难度。

我们首先先去mysql的官网下载mysql shell的tar包，MySQL :: Download MySQL Shell，然后上传到容器内，进行安装，我这里提前解压好了传到了容器内，只需要设置属组，权限和环境变量即可使用。

bash-5.1# chown -R mysql:mysql mysql-shell-8.0.42-linux-glibc2.28-x86-64bit/
bash-5.1# echo "export PATH=\$PATH:/var/lib/mysql/mysql-shell-8.0.42-linux-glibc2.28-x86-64bit/bin" >> /etc/profile
bash-5.1# source /etc/profile

然后我们使用mysql shell来快速创建一个InnoDB集群，使用Mysqlsh启动mysql shell，如果是已经创建过MGR组复制集群的，输入var cluster = dba.createCluster('testCluster', {adoptFromGR: true});创建一个InnoDB集群就行了。

[root@mysql-master-0 mysql]# mysqlsh -P 3306 -u root -p
 MySQL  localhost:3306 ssl  JS > var cluster = dba.createCluster('testCluster', {adoptFromGR: true});
A new InnoDB Cluster will be created based on the existing replication group on instance 'mysql-master:3306'.
Creating InnoDB Cluster 'testCluster' on 'mysql-master:3306'...

Adding Seed Instance...
Adding Instance 'mysql-node-2:3306'...
Adding Instance 'mysql-master:3306'...
Resetting distributed recovery credentials across the cluster...
Cluster successfully created based on existing replication group.

查看一下InnoDB集群的状态。

 MySQL  localhost:3306 ssl  JS > dba.getCluster("testCluster").status();
{
    "clusterName": "testCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-master:3306",
        "ssl": "DISABLED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql-master:3306": {
                "address": "mysql-master:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            },
            "mysql-node-1:3306": {
                "address": "mysql-node-1:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            },
            "mysql-node-2:3306": {
                "address": "mysql-node-2:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-master:3306"
}

status:OK，集群创建成功。

Mysql Router

Mysql Router可以通过路由实现读写分离，且能在主节点故障时，自动检测新的Primary并重定向流量，并且能实时感知集群拓补变化，引导新节点加入。

安装Mysql Router，和Shell一样，先把router解压过后的文件发送到容器内部，设置好属主和权限，然后设置环境变量即可。

[root@mysql-master-0 mysql]# echo "export PATH=\$PAHT:/var/lib/mysql/mysql-router-8.0.42-linux-glibc2.28-x86_64/bin" >> /etc/profile
[root@mysql-master-0 mysql]# source /etc/profile

然后我们来初始化Mysql Router，其中bootstrap填写本节点，config-bind-address填写你的服务IP地址。

在这里我要解释一下这个服务IP地址是什么，假设你有一个web服务IP地址为11.1.0.1，在这个web服务的宿主机中，安装mysql Router，那这个config-bind-address就填写该服务的地址，Router就会监听该服务IP的6446（读写）端口和6447（只读）端口，这时候再使用nginx进行反向代理到web服务的IP地址加6447端口号，Router就能让Nginx进来的服务流量导向到InnoDB集群的只读端口中，实现读写分离。

所以其实，Mysql Router可以只安装在你的服务机器上，数据库节点是不用安装的，但是为了演示，在这里我拿Node-2和Master节点的MYSQL来做测试。

[root@mysql-node-2-0 /]# mysqlrouter --bootstrap root:111111@mysql-node-2 --directory /var/lib/mysql/mysqlrouter --conf-use-sockets --user=mysql --name=mysql_router_1 --conf-bind-address=mysql-node-2 --account-host="%"  --force
# Reconfiguring MySQL Router 8.0.42 (MySQL Community - GPL) instance at '/var/lib/mysql/mysqlrouter'...
- Fetching password for current account (mysql_router1_krx8jyl) from keyring
- Creating account(s) (only those that are needed, if any)
Fetching Cluster Members
trying to connect to mysql-server at mysql-master:3306
- Creating account(s) (only those that are needed, if any)
- Using existing certificates from the '/var/lib/mysql/mysqlrouter/data' directory
- Verifying account (using it to run SQL queries that would be run by Router)
- Storing account in keyring
- Adjusting permissions of generated files
- Creating configuration /var/lib/mysql/mysqlrouter/mysqlrouter.conf
# MySQL Router 'mysql_router_1' configured for the InnoDB Cluster 'testCluster'
After this MySQL Router has been started with the generated configuration
    $ mysqlrouter -c /var/lib/mysql/mysqlrouter/mysqlrouter.conf
InnoDB Cluster 'testCluster' can be reached by connecting to:

## MySQL Classic protocol
- Read/Write Connections: localhost:6446, /var/lib/mysql/mysqlrouter/mysql.sock
- Read/Only Connections:  localhost:6447, /var/lib/mysql/mysqlrouter/mysqlro.sock

## MySQL X protocol
- Read/Write Connections: localhost:6448, /var/lib/mysql/mysqlrouter/mysqlx.sock
- Read/Only Connections:  localhost:6449, /var/lib/mysql/mysqlrouter/mysqlxro.sock

然后我们可以去指定的文件夹里查看路由的配置信息，但是这里我就不看了，毕竟还没到纠错的时候，直接启动Mysql Router即可。因为我是在容器中运行的，所以需要先把sudo给删了，这里用sed -i替换为空。

bash-5.1# sed -i 's/sudo//g' /var/lib/mysql/mysqlrouter/start.sh
bash-5.1# bash /var/lib/mysql/mysqlrouter/start.sh  2>&1 > /dev/null &
[1] 580
bash-5.1# PID 582 written to '/var/lib/mysql/mysqlrouter/mysqlrouter.pid'
stopping to log to the console. Continuing to log to filelog
[1]+  Done                    bash /var/lib/mysql/mysqlrouter/start.sh 2>&1 > /dev/null

然后进入本地的mysql服务中进行测试，但这次的端口号要改成mysql router的读写分离的端口号，然后再加入我们指定的服务host。

读接口

此时我们会发现，即使是在mysql的master节点下使用mysql -u -root -p登录，但是只要指定了Router的读接口和服务host，就无法写入东西，随后我们查询一下进入的是哪个节点的mysql服务，会发现进入的是mysql-node-1-0的MYSQL服务，在查询一下我们进入的用户，会发现其实是Router在初始化的时候创建的用户。

bash-5.1# mysql -u root -p -P 6447 -h mysql-master
mysql> show databases;
+-------------------------------+
| Database                      |
+-------------------------------+
| information_schema            |
| mysql                         |
| mysql_innodb_cluster_metadata |
| performance_schema            |
| sys                           |
| test                          |
| test_2                        |
+-------------------------------+
7 rows in set (0.00 sec)
mysql> drop database test_2;
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
mysql> select @@hostname as hostname, @@port as port;
+----------------+------+
| hostname       | port |
+----------------+------+
| mysql-node-1-0 | 3306 |
+----------------+------+
1 row in set (0.00 sec)
mysql> select user();
+------------------------------------------------------+
| user()                                               |
+------------------------------------------------------+
| root@10-244-8-146.mysql-master.sql.svc.cluster.local |
+------------------------------------------------------+
1 row in set (0.00 sec)

写节点

和读节点一样，我们试试删掉数据库，发现删除成功了，然后查看一下当前的mysql服务节点和用户，会发现已经切换到了Master节点中的MYSQL服务中，而用户也是之前Router创建的用户了。

bash-5.1# mysql -u root -p -P 6446 -h mysql-master
mysql> drop database test_2;
Query OK, 0 rows affected (0.20 sec)
mysql> select @@hostname as hostname, @@port as port;
+----------------+------+
| hostname       | port |
+----------------+------+
| mysql-master-0 | 3306 |
+----------------+------+
1 row in set (0.00 sec)
mysql> show databases;
+-------------------------------+
| Database                      |
+-------------------------------+
| information_schema            |
| mysql                         |
| mysql_innodb_cluster_metadata |
| performance_schema            |
| sys                           |
| test                          |
+-------------------------------+
6 rows in set (0.00 sec)
mysql> select user();
+-------------------------------------------------+
| user()                                          |
+-------------------------------------------------+
| root@mysql-master-0.mysql.sql.svc.cluster.local |
+-------------------------------------------------+

到此InnoDB集群就安装完成了，那接下来，别忘了我们还有一个节点备用着呢，接下来我们来讲怎么快速新增一个节点。

节点新增

我们进入之前已经配置好的节点，要是没有备用node-3节点的可以回去看MGR的测试实验。

方法很简单，我们需要先去新增节点里，给足node-3的复制用户repl权限，不然会报错，详细可以去看部署常见错误中的案例。

mysql> grant all on *.* to 'repl'@'%';
mysql> GRANT CLONE_ADMIN, CONNECTION_ADMIN, CREATE USER, EXECUTE, FILE, GROUP_REPLICATION_ADMIN, PERSIST_RO_VARIABLES_ADMIN, PROCESS, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, REPLICATION_APPLIER, REPLICATION_SLAVE_ADMIN, ROLE_ADMIN, SELECT, SHUTDOWN, SYSTEM_VARIABLES_ADMIN ON *.* TO 'repl'@'%' WITH GRANT OPTION;
mysql> GRANT DELETE, INSERT, UPDATE ON mysql.* TO 'repl'@'%' WITH GRANT OPTION;
mysql> grant REPLICATION_SLAVE_ADMIN on *.* to 'repl'@'%';

然后再加入，当然这里也会有报错，反正报错很多，如果没报错就不用管了，报错就看下面的常见错误。

 MySQL  mysql-master:3306 ssl  JS > dba.getCluster("testCluster").addInstance("repl@mysql-node-3:3306", {     password: "111111",     recoveryMethod: "clone",   });
WARNING: The password option is deprecated and will be removed in a future release.


WARNING: A GTID set check of the MySQL instance at 'mysql-node-3:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.

mysql-node-3:3306 has the following errant GTIDs that do not exist in the cluster:
812260f5-482d-11f0-bb8f-7234986dc8ac:1-7

WARNING: Discarding these extra GTID events can either be done manually or by completely overwriting the state of mysql-node-3:3306 with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to choosing the clone recovery method.
Clone based recovery selected through the recoveryMethod option

Validating instance configuration at mysql-node-3:3306...

This instance reports its own address as mysql-node-3:3306

Instance configuration is suitable.
NOTE: Group Replication will communicate with other members using 'mysql-node-3:33061'. Use the localAddress option to override.

* Checking connectivity and SSL configuration...
A new instance will be added to the InnoDB Cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster...

NOTE: User 'mysql_innodb_cluster_4'@'%' already existed at instance 'mysql-master:3306'. It will be deleted and created again with a new password.
Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.

* Waiting for clone to finish...
NOTE: mysql-node-3:3306 is being cloned from mysql-master:3306
** Stage DROP DATA: Completed
** Clone Transfer
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed

NOTE: mysql-node-3:3306 is shutting down...

* Waiting for server restart... ready
* mysql-node-3:3306 has restarted, waiting for clone to finish...
** Stage RESTART: Completed
* Clone process has finished: 79.95 MB transferred in about 1 second (~79.95 MB/s)

State recovery already finished for 'mysql-node-3:3306'

The instance 'mysql-node-3:3306' was successfully added to the cluster.

最后看一眼状态

            "mysql-node-3:3306": {
                "address": "mysql-node-3:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-master:3306"

完美。

部署常见错误

InnoDB集群节点事务无法执行报错

查看集群状态后，发现节点有实例报错，报错信息为无法执行XX语句，是因为之前的相同信息写入过或者缺少了某个关键的事务，比如创建数据库的事务丢失，直接运行建表，这时候要么手动执行缺失事务，要么跳过事务，我这里介绍的方法是跳过事务。

 MySQL  localhost:3306 ssl  JS > cluster.status()

{
    "clusterName": "testCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-master:3306",
        "ssl": "DISABLED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure. 2 members are not active.",
        "topology": {
            "mysql-master:3306": {
                "address": "mysql-master:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            },
            "mysql-node-1:3306": {
                "address": "mysql-node-1:3306",
                "instanceErrors": [
                    "ERROR: applier thread of Group Replication Recovery channel stopped with an error: Worker 1 failed executing transaction '85c65aa2-487e-11f0-ba56-a6771ebc8e79:7' at source log 2master-bin.000003, end_log_pos 480; Error 'Operation CREATE USER failed for 'repl'@'%'' on query. Default database: ''. Query: 'CREATE USER 'repl'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FD571203974BA9AFE270FE62151AE967ECA5E0AA'' (1396) at 2025-06-13 18:52:58.271584",
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "SECONDARY",
                "mode": "n/a",
                "readReplicas": {},
                "recovery": {
                    "applierError": "Worker 1 failed executing transaction '85c65aa2-487e-11f0-ba56-a6771ebc8e79:7' at source log 2master-bin.000003, end_log_pos 480; Error 'Operation CREATE USER failed for 'repl'@'%'' on query. Default database: ''. Query: 'CREATE USER 'repl'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FD571203974BA9AFE270FE62151AE967ECA5E0AA''",
                    "applierErrorNumber": 1396,
                    "state": "APPLIER_ERROR"
                },
                "recoveryStatusText": "Distributed recovery in progress",
                "role": "HA",
                "status": "RECOVERING",
                "version": "8.0.42"
            },
            "mysql-node-2:3306": {
                "address": "mysql-node-2:3306",
                "instanceErrors": [
                    "ERROR: applier thread of Group Replication Recovery channel stopped with an error: Worker 1 failed executing transaction '85c65aa2-487e-11f0-ba56-a6771ebc8e79:7' at source log 2master-bin.000003, end_log_pos 480; Error 'Operation CREATE USER failed for 'repl'@'%'' on query. Default database: ''. Query: 'CREATE USER 'repl'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FD571203974BA9AFE270FE62151AE967ECA5E0AA'' (1396) at 2025-06-13 18:52:59.025042",
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "SECONDARY",
                "mode": "n/a",
                "readReplicas": {},
                "recovery": {
                    "applierError": "Worker 1 failed executing transaction '85c65aa2-487e-11f0-ba56-a6771ebc8e79:7' at source log 2master-bin.000003, end_log_pos 480; Error 'Operation CREATE USER failed for 'repl'@'%'' on query. Default database: ''. Query: 'CREATE USER 'repl'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FD571203974BA9AFE270FE62151AE967ECA5E0AA''",
                    "applierErrorNumber": 1396,
                    "state": "APPLIER_ERROR"
                },
                "recoveryStatusText": "Distributed recovery in progress",
                "role": "HA",
                "status": "RECOVERING",
                "version": "8.0.42"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-master:3306"
}

我们选择跳过事务，因为此时集群已经启动，从节点被开启了只读，所以我们要通过mysql shell连接数据库并切换为SQL语句模式，再停止从节点的MGR，然后跳过事务，再重启MGR。这里以NODE-2节点的报错事务举例，首先我们要把上面报错中的GTID事务号复制下来。

[root@mysql-master-0 mysql]# mysqlsh -P 3306 -u root -p -h mysql-node-2
 MySQL  mysql-node-2:3306 ssl  JS > \sql
 MySQL  mysql-node-2:3306 ssl  SQL > set gtid_next = '85c65aa2-487e-11f0-ba56-a6771ebc8e79:7';
Query OK, 0 rows affected (0.0009 sec)
 MySQL  mysql-node-2:3306 ssl  SQL > BEGIN;
Query OK, 0 rows affected (0.0007 sec)
 MySQL  mysql-node-2:3306 ssl  ★  SQL > COMMIT;
Query OK, 0 rows affected (0.0091 sec)
 MySQL  mysql-node-2:3306 ssl  SQL > SET GTID_NEXT = 'AUTOMATIC';
Query OK, 0 rows affected (0.0008 sec)
 MySQL  mysql-node-2:3306 ssl  SQL > start group_replication;
Query OK, 0 rows affected, 1 warning (9.4101 sec)
Warning (code 1287): 'group_replication_ip_whitelist' is deprecated and will be removed in a future release. Please use group_replication_ip_allowlist instead

然后我们再次切换回JS模式，查看集群信息。

MySQL  mysql-node-2:3306 ssl  SQL > \js
Switching to JavaScript mode... 
MySQL  mysql-node-2:3306 ssl  JS > dba.getCluster("testCluster").status();
{
    "clusterName": "testCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-master:3306",
        "ssl": "DISABLED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql-master:3306": {
                "address": "mysql-master:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            },
            "mysql-node-1:3306": {
                "address": "mysql-node-1:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            },
            "mysql-node-2:3306": {
                "address": "mysql-node-2:3306",
                "instanceErrors": [
                    "NOTE: The required parallel-appliers settings are not enabled on the instance. Use dba.configureInstance() to fix it."
                ],
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-master:3306"

Status：OK，完美解决。

新节点加入集群时报错权限不够

类似以下问题

 MySQL  mysql-master:3306 ssl  JS > cluster.addInstance("repl@mysql-node-3:3306", {     password: "111111",     recoveryMethod: "clone",   });
WARNING: The password option is deprecated and will be removed in a future release.

WARNING: A GTID set check of the MySQL instance at 'mysql-node-3:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.
mysql-node-3:3306 has the following errant GTIDs that do not exist in the cluster:
812260f5-482d-11f0-bb8f-7234986dc8ac:1-11
WARNING: Discarding these extra GTID events can either be done manually or by completely overwriting the state of mysql-node-3:3306 with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.
Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to choosing the clone recovery method.
Clone based recovery selected through the recoveryMethod option

Validating instance configuration at mysql-node-3:3306...
ERROR: The account 'repl'@'%' is missing privileges required to manage an InnoDB cluster:
GRANT CLONE_ADMIN, CONNECTION_ADMIN, CREATE USER, EXECUTE, FILE, GROUP_REPLICATION_ADMIN, PERSIST_RO_VARIABLES_ADMIN, PROCESS, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, REPLICATION_APPLIER, REPLICATION_SLAVE_ADMIN, ROLE_ADMIN, SELECT, SHUTDOWN, SYSTEM_VARIABLES_ADMIN ON *.* TO 'repl'@'%' WITH GRANT OPTION;
GRANT DELETE, INSERT, UPDATE ON mysql.* TO 'repl'@'%' WITH GRANT OPTION;
GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SHOW VIEW, TRIGGER, UPDATE ON mysql_innodb_cluster_metadata.* TO 'repl'@'%' WITH GRANT OPTION;
GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SHOW VIEW, TRIGGER, UPDATE ON mysql_innodb_cluster_metadata_bkp.* TO 'repl'@'%' WITH GRANT OPTION;
GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SHOW VIEW, TRIGGER, UPDATE ON mysql_innodb_cluster_metadata_previous.* TO 'repl'@'%' WITH GRANT OPTION;
For more information, see the online documentation.

注意看ERROR那一行，他已经把要给的权限配置给你了，你复制下来用shell连接或者直接去那个节点赋权就行，如果read_only使用如下命令先关闭。

set global super_read_only=off
set global read_only=off

新节点加入集群时报错配置错误

 MySQL  mysql-master:3306 ssl  JS > cluster.addInstance("repl@mysql-node-3:3306", {     password: "111111",     recoveryMethod: "clone",   });
WARNING: The password option is deprecated and will be removed in a future release.


WARNING: A GTID set check of the MySQL instance at 'mysql-node-3:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.

mysql-node-3:3306 has the following errant GTIDs that do not exist in the cluster:
812260f5-482d-11f0-bb8f-7234986dc8ac:1-13

WARNING: Discarding these extra GTID events can either be done manually or by completely overwriting the state of mysql-node-3:3306 with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to choosing the clone recovery method.
Clone based recovery selected through the recoveryMethod option

Validating instance configuration at mysql-node-3:3306...

This instance reports its own address as mysql-node-3:3306

NOTE: Some configuration options need to be fixed:
+----------------------------------------+---------------+----------------+----------------------------+
| Variable                               | Current Value | Required Value | Note                       |
+----------------------------------------+---------------+----------------+----------------------------+
| binlog_transaction_dependency_tracking | COMMIT_ORDER  | WRITESET       | Update the server variable |
+----------------------------------------+---------------+----------------+----------------------------+

NOTE: Please use the dba.configureInstance() command to repair these issues.

ERROR: Instance must be configured and validated with dba.checkInstanceConfiguration() and dba.configureInstance() before it can be used in an InnoDB cluster.
ERROR: RuntimeError: Instance check failed
Cluster.addInstance: Instance check failed (RuntimeError)

用shell连接到新加入的节点，然后用如下命令进行配置修复。

 MySQL  mysql-node-3:3306 ssl  JS > dba.configureInstance('repl@mysql-node-3:3306');
Please provide the password for 'repl@mysql-node-3:3306': ******
Save password for 'repl@mysql-node-3:3306'? [Y]es/[N]o/Ne[v]er (default No):
Configuring MySQL instance at mysql-node-3:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql-node-3:3306

applierWorkerThreads will be set to the default value of 4.

NOTE: Some configuration options need to be fixed:
+----------------------------------------+---------------+----------------+----------------------------+
| Variable                               | Current Value | Required Value | Note                       |
+----------------------------------------+---------------+----------------+----------------------------+
| binlog_transaction_dependency_tracking | COMMIT_ORDER  | WRITESET       | Update the server variable |
+----------------------------------------+---------------+----------------+----------------------------+

Do you want to perform the required configuration changes? [y/n]: y
Configuring instance...

WARNING: '@@binlog_transaction_dependency_tracking' is deprecated and will be removed in a future release. (Code 1287).
The instance 'mysql-node-3:3306' was configured to be used in an InnoDB cluster.

新节点加入集群后出现权限不足ERROR

            "mysql-node-3:3306": {
                "address": "mysql-node-3:3306",
                "instanceErrors": [
                    "WARNING: Instance is not managed by InnoDB cluster. Use cluster.rescan() to repair.",
                    "ERROR: Invalid or missing information of Group Replication's network address in metadata. Use Cluster.rescan() to update the metadata.",
                    "ERROR: Metadata for this instance does not match X plugin port reported by instance (metadata=, actual=mysql-node-3:33060). Use rescan() to update the metadata."
                ],
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.42"
            }
        },

同上，把节点权限给了就行了，然后强制重添加一次节点。

cluster.addInstance("repl@mysql-node-3:3306", {password: "111111",recoveryMethod: "clone",force:true});

新节点加入集群后出现元数据不一致

"mysql-node-3:3306": {
                "address": "mysql-node-3:3306",
                "instanceErrors": [
                    "WARNING: Instance is not managed by InnoDB cluster. Use cluster.rescan() to repair.",
                    "ERROR: Invalid or missing information of Group Replication's network address in metadata. Use Cluster.rescan() to update the metadata.",
                    "ERROR: Metadata for this instance does not match X plugin port reported by instance (metadata=, actual=mysql-node-3:33060). Use rescan() to update the metadata."

进入节点重置主节点元数据和曾经加入过的节点通道，然后强制重添加一次节点。

 MySQL  mysql-node-3:3306 ssl  SQL > set global super_read_only=off;
Query OK, 0 rows affected (0.0008 sec)
 MySQL  mysql-node-3:3306 ssl  SQL > set global read_only=off;
 MySQL  mysql-node-3:3306 ssl  SQL > reset master;
MySQL  mysql-node-3:3306 ssl  SQL > reset replica all;

其实就是看清楚你进的哪个节点端口号，使用mysqlsh后面要指定3306端口号，不然用33060端口号来加入服务就会出现元数据不一致。

SQL学习之-MYSQL-InnoDB集群

高可靠方案简要

InnoDB集群

MySQL Group Replication（组复制）

同步流程

测试实验

InnoDB集群

Mysql Shell

Mysql Router

节点新增

部署常见错误

InnoDB集群节点事务无法执行报错

新节点加入集群时报错权限不够

新节点加入集群时报错配置错误

新节点加入集群后出现权限不足ERROR

新节点加入集群后出现元数据不一致