Research on opengauss physical backup and recovery_ basebackup

Gauss squirrel Club 2021-11-25 17:49:51

https://www.modb.pro/db/87947

openGauss Physical backup recovery gs_basebackup

gs_basebackup The background that

  • gs_basebackup Only full backup is supported , Incremental is not supported .
  • gs_basebackup Hot backup mode and compressed format backup mode are currently supported .
  • gs_basebackup When backing up a tablespace that contains an absolute path , If you are backing up on the same machine , Can pass tablespace-mapping Redirect tablespace path , Or use Archive Mode for backup .

gs_basebackup Parameter description

gs_basebackup Parameters can be divided into the following categories :

  • -D directory Backup file output directory , Will options .

Common parameters :

  • -c,–checkpoint=fast|spread Set the checkpoint mode to fast perhaps spread( Default ).

  • -l,–label=LABEL Label backups .

  • -P,–progress Enable progress report .

  • -v, –verbose Enable verbosity mode .

  • -V, –version Exit after printing the version .

  • -?,–help Show gs_basebackup Command line arguments .

  • -T,–tablespace-mapping=olddir=newdir

    The directory will be deleted during backup olddir Table spaces in are relocated to newdir in . To make it effective ,olddir Must exactly match the path where the tablespace is located ( But if the backup does not contain olddir The tablespace in is not an error ).olddir and newdir It has to be an absolute path . If a path happens to contain a = Symbol , You can escape it with a backslash . You can use this option multiple times for multiple tablespaces .

  • -F,–format=plain|tar

    Set the output format to plain( Default ) perhaps tar. Without setting this parameter , Default –format=plain.plain Format writes the output as a flat file , Use the same layout as the current data directory and table space . When the cluster has no additional table space , The entire database will be placed in the target directory . If the cluster contains additional tablespaces , The master data directory will be placed in the target directory , But all other tablespaces will be placed in the same absolute path where they are located on the server .tar The schema writes the output as... In the target directory tar file . The master data directory will be written to a file named base.tar In the file of , And other tablespaces will be used as their OID name . Generated tar package , Need to use gs_tar Command decompression .

  • -X, –xlog-method=fetch|stream

    Set up xlog transport . Without setting this parameter , Default –xlog-method=stream. Include the required pre written log files in the backup (WAL file ). This includes all pre written logs generated during backup .fetch Method to collect pre written log files at the end of the backup . therefore , It is necessary to wal_keep_segments Set the parameter high enough , In this way, the log will not be removed until the end of the backup . If the log has been rotated when it is to be transferred , The backup will fail and be unavailable .stream Streaming the pre written log when the backup is created . This will open a second connection to the server and start streaming the pre written logs in parallel while the backup is running . therefore , It will use up to two max_wal_senders Parameter configuration connection . As long as the client can keep receiving pre written logs , Using this mode, there is no need to save additional pre written logs on the main control computer .

  • -x,–xlog Using this option is equivalent to and methods fetch Use it together -X.

  • -Z –compress=level

    Enable pair tar File output gzip Compress , And set the compression level (0 To 9,0 It's not compressed ,9 Is the best compression ). Only use tar Compression is only available when the format is , And will be in all tar The file name is automatically followed by a suffix .gz.

  • -z Enable pair tar File output gzip Compress , Use the default compression level . Only use tar Compression is only available when the format is , And will be in all tar The file name is automatically followed by a suffix .gz.

  • -t,–rw-timeout Set the backup period checkpoint Time limit of , The default limit time is 120s. When the database is full checkpoint When it takes a long time , It can be increased appropriately rw-timeout Time limit .

Connection parameters

  • -h, –host=HOSTNAME Specify the host name of the running server or Unix Path to domain socket .
  • -p,–port=PORT Specifies the port number of the database server
  • -U,–username=USERNAME Specify the user who connects to the database .
  • -s, –status-interval=INTERVAL Time of the status packet sent to the server ( In seconds )
  • -w,–no-password No password prompt appears .
  • -W, –password When using -U Parameter when connecting to a local database or a remote database , By specifying this option, a prompt for entering a password appears .

Experimental example

Simulation experiment of master node losing important files :

  • View cluster status :

    [[email protected] ~]$ gs_om -t status --detail
    [ Cluster State ]
    cluster_state : Normal
    redistributing : No
    current_az : AZ_ALL
    [ Datanode State ]
    node node_ip instance state | node node_ip instance state
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1 wzsy01 9.1.14.39 6001 opt/huawei/install/data/d1 P Primary Normal | 2 wzsy02 9.1.14.40 6002 opt/huawei/install/data/d2 S Standby Normal
    
  • Back up the primary node :

    [[email protected] ~]$ gs_basebackup -D home/omm/backup/ -h wzsy01 -p 15400
    INFO: The starting position of the xlog copy of the full build is: 0/4B000028. The slot minimum LSN is: 0/4B000140.
    [2021-07-26 11:22:00]:begin build tablespace list
    [2021-07-26 11:22:00]:finish build tablespace list
    [2021-07-26 11:22:00]:begin get xlog by xlogstream
    [2021-07-26 11:22:00]: check identify system success
    [2021-07-26 11:22:00]: send START_REPLICATION 0/4B000000 success
    [2021-07-26 11:22:00]: keepalive message is received
    [2021-07-26 11:22:00]: keepalive message is received
    [2021-07-26 11:22:03]: keepalive message is received
    [2021-07-26 11:22:11]:gs_basebackup: base backup successfully
    [[email protected] ~]$ cd backup/
    [[email protected] backup]$ ls -lh
    total 4.9M
    -rw------- 1 omm dbgrp 208 Jul 26 11:22 backup_label
    drwx------ 6 omm dbgrp 54 Jul 26 11:22 base
    -rw------- 1 omm dbgrp 4.3K Jul 26 11:22 cacert.pem
    drwx------ 2 omm dbgrp 4.0K Jul 26 11:22 global
    -rw------- 1 omm dbgrp 4.7M Jul 26 11:22 gswlm_userinfo.cfg
    -rw------- 1 omm dbgrp 20K Jul 26 11:22 mot.conf
    drwx------ 2 omm dbgrp 26 Jul 26 11:22 pg_clog
    drwx------ 2 omm dbgrp 26 Jul 26 11:22 pg_csnlog
    -rw------- 1 omm dbgrp 0 Jul 26 11:22 pg_ctl.lock
    drwx------ 2 omm dbgrp 6 Jul 26 11:22 pg_errorinfo
    -rw------- 1 omm dbgrp 4.5K Jul 26 11:22 pg_hba.conf
    -rw------- 1 omm dbgrp 4.5K Jul 26 11:22 pg_hba.conf.bak
    -rw------- 1 omm dbgrp 1.0K Jul 26 11:22 pg_hba.conf.lock
    -rw------- 1 omm dbgrp 1.6K Jul 26 11:22 pg_ident.conf
    drwx------ 4 omm dbgrp 39 Jul 26 11:22 pg_llog
    drwx------ 4 omm dbgrp 36 Jul 26 11:22 pg_multixact
    drwx------ 2 omm dbgrp 26 Jul 26 11:22 pg_notify
    drwx------ 2 omm dbgrp 6 Jul 26 11:22 pg_replslot
    drwx------ 2 omm dbgrp 6 Jul 26 11:22 pg_serial
    drwx------ 2 omm dbgrp 6 Jul 26 11:22 pg_snapshots
    drwx------ 2 omm dbgrp 25 Jul 26 11:22 pg_stat_tmp
    drwx------ 2 omm dbgrp 6 Jul 26 11:22 pg_tblspc
    drwx------ 2 omm dbgrp 6 Jul 26 11:22 pg_twophase
    -rw------- 1 omm dbgrp 4 Jul 26 11:22 PG_VERSION
    drwx------ 3 omm dbgrp 92 Jul 26 11:22 pg_xlog
    -rw------- 1 omm dbgrp 38K Jul 26 11:22 postgresql.conf
    -rw------- 1 omm dbgrp 38K Jul 26 11:22 postgresql.conf.bak
    -rw------- 1 omm dbgrp 1.0K Jul 26 11:22 postgresql.conf.lock
    -rw------- 1 omm dbgrp 4.3K Jul 26 11:22 server.crt
    -rw------- 1 omm dbgrp 1.8K Jul 26 11:22 server.key
    -rw------- 1 omm dbgrp 56 Jul 26 11:22 server.key.cipher
    -rw------- 1 omm dbgrp 24 Jul 26 11:22 server.key.rand
    
  • The simulated master node is missing files

    [[email protected] ~]$ cd opt/huawei/install/data/d1/
    [[email protected] d1]$ ls
    base pg_csnlog pg_llog pg_stat_tmp postgresql.conf.lock
    cacert.pem pg_ctl.lock pg_location pg_tblspc postmaster.opts
    gaussdb.state pg_errorinfo pg_multixact pg_twophase postmaster.pid
    global pg_hba.conf pg_notify PG_VERSION server.crt
    gswlm_userinfo.cfg pg_hba.conf.bak pg_replslot pg_xlog server.key
    mot.conf pg_hba.conf.lock pg_serial postgresql.conf server.key.cipher
    pg_clog pg_ident.conf pg_snapshots postgresql.conf.bak server.key.rand
    [[email protected] d1]$ rm -rf server.*
    [[email protected] d1]$ rm -rf pg_hba.conf*
    [[email protected] d1]$ ls
    base pg_clog pg_location pg_stat_tmp postgresql.conf.bak
    cacert.pem pg_csnlog pg_multixact pg_tblspc postgresql.conf.lock
    gaussdb.state pg_ctl.lock pg_notify pg_twophase postmaster.opts
    global pg_errorinfo pg_replslot PG_VERSION postmaster.pid
    gswlm_userinfo.cfg pg_ident.conf pg_serial pg_xlog
    mot.conf pg_llog pg_snapshots postgresql.conf
    [[email protected] d1]$ gs_om -t status --detail
    [ Cluster State ]
    cluster_state : Normal
    redistributing : No
    current_az : AZ_ALL
    [ Datanode State ]
    node node_ip instance state | node node_ip instance state
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1 wzsy01 9.1.14.39 6001 opt/huawei/install/data/d1 P Primary Normal | 2 wzsy02 9.1.14.40 6002 opt/huawei/install/data/d2 S Standby Normal
    
  • Restart the cluster

    [[email protected] d1]$ gs_om -t stop
    Stopping cluster.
    =========================================
    Successfully stopped cluster.
    =========================================
    End stop cluster.
    [[email protected] d1]$ gs_om -t start
    Starting cluster.
    =========================================
    [SUCCESS] wzsy02
    2021-07-26 11:25:03.656 60fe2b0f.1 [unknown] 139725014521600 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4250 Mbytes) is larger.
    =========================================
    [GAUSS-53600]: Can not start the database, the cmd is source home/omm/.bashrc; python3 '/opt/huawei/install/om/script/local/StartInstance.py' -U omm -R opt/huawei/install/app -t 300 --security-mode=off, Error:
    [FAILURE] wzsy01:
    [GAUSS-51607] : Failed to start instance. Error: Please check the gs_ctl log for failure details.
    [2021-07-26 11:24:59.261][90121][][gs_ctl]: gs_ctl started,datadir is opt/huawei/install/data/d1
    [2021-07-26 11:24:59.434][90121][][gs_ctl]: waiting for server to start...
    .0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
    0 LOG: [Alarm Module]Host Name: wzsy01
    0 LOG: [Alarm Module]Host IP: 9.1.14.39
    0 LOG: [Alarm Module]Cluster Name: Cluster_template
    0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52
    0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
    0 WARNING: failed to parse feature control file: gaussdb.version.
    0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.
    0 LOG: Failed to initialze environment for codegen.
    The core dump path is an invalid directory
    2021-07-26 11:24:59.789 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max = 4, actual = 4
    2021-07-26 11:24:59.789 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
    2021-07-26 11:24:59.789 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
    2021-07-26 11:24:59.789 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: wzsy01
    2021-07-26 11:24:59.790 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: 9.1.14.39
    2021-07-26 11:24:59.790 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: Cluster_template
    2021-07-26 11:24:59.790 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52
    2021-07-26 11:24:59.790 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Transparent encryption disabled.
    2021-07-26 11:24:59.797 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: loaded library "security_plugin"
    2021-07-26 11:24:59.799 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
    2021-07-26 11:24:59.799 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4250 Mbytes) is larger.
    2021-07-26 11:24:59.907 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
    2021-07-26 11:24:59.960 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set metadata cache size(268435456)
    2021-07-26 11:25:00.451 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/opt/huawei/install/data/d1/gaussdb.state.temp" success
    2021-07-26 11:25:00.452 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Standby)
    2021-07-26 11:25:00.478 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 979, usable_fds = 1000, already_open = 11
    The core dump path is an invalid directory
    2021-07-26 11:25:00.482 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: the configure file /opt/huawei/install/app/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user!
    2021-07-26 11:25:00.482 60fe2b0b.1 [unknown] 140465425811200 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Failed to parse cgroup config file.
    .[2021-07-26 11:25:02.249][90121][][gs_ctl]: waitpid 90124 failed, exitstatus is 256, ret is 2
    [2021-07-26 11:25:02.250][90121][][gs_ctl]: stopped waiting
    [2021-07-26 11:25:02.250][90121][][gs_ctl]: could not start server
    Examine the log output..
    
  • The primary node reported an error during cluster startup , The standby node starts , However, the standby node status needs to be repaired :

    [[email protected] d1]$ gs_om -t status --detail
    [ Cluster State ]
    cluster_state : Unavailable
    redistributing : No
    current_az : AZ_ALL
    [ Datanode State ]
    node node_ip instance state | node node_ip instance state
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1 wzsy01 9.1.14.39 6001 /opt/huawei/install/data/d1 P Down Manually stopped | 2 wzsy02 9.1.14.40 6002 /opt/huawei/install/data/d2 S Standby Need repair(Disconnected)
    
  • Restore delete file , Start the original master node manually

    [[email protected] d1]$ cp /home/omm/backup/server.* /opt/huawei/install/data/d1/
    [[email protected] d1]$ cp /home/omm/backup/pg_hba.conf* /opt/huawei/install/data/d1/
    [[email protected] d1]$ gs_ctl start -D /opt/huawei/install/data/d1/ -M primary
    [2021-07-26 14:51:40.756][7855][][gs_ctl]: gs_ctl started,datadir is /opt/huawei/install/data/d1
    [2021-07-26 14:51:40.933][7855][][gs_ctl]: waiting for server to start...
    .0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
    0 LOG: [Alarm Module]Host Name: wzsy01
    0 LOG: [Alarm Module]Host IP: 9.1.14.39
    0 LOG: [Alarm Module]Cluster Name: Cluster_template
    0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52
    0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
    0 WARNING: failed to parse feature control file: gaussdb.version.
    0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.
    0 LOG: Failed to initialze environment for codegen.
    The core dump path is an invalid directory
    2021-07-26 14:51:41.300 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max = 4, actual = 4
    2021-07-26 14:51:41.300 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
    2021-07-26 14:51:41.300 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
    2021-07-26 14:51:41.301 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: wzsy01
    2021-07-26 14:51:41.301 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: 9.1.14.39
    2021-07-26 14:51:41.301 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: Cluster_template
    2021-07-26 14:51:41.301 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52
    2021-07-26 14:51:41.301 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Transparent encryption disabled.
    2021-07-26 14:51:41.305 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: loaded library "security_plugin"
    2021-07-26 14:51:41.307 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
    2021-07-26 14:51:41.307 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4250 Mbytes) is larger.
    2021-07-26 14:51:41.412 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
    2021-07-26 14:51:41.464 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set metadata cache size(268435456)
    2021-07-26 14:51:41.944 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/opt/huawei/install/data/d1/gaussdb.state.temp" success
    2021-07-26 14:51:41.944 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Primary)
    2021-07-26 14:51:41.971 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 978, usable_fds = 1000, already_open = 12
    The core dump path is an invalid directory
    2021-07-26 14:51:41.975 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: the configure file /opt/huawei/install/app/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user!
    2021-07-26 14:51:41.975 60fe5b7d.1 [unknown] 139948390721280 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Failed to parse cgroup config file.
    [2021-07-26 14:51:43.017][7855][][gs_ctl]: done
    [2021-07-26 14:51:43.017][7855][][gs_ctl]: server started (/opt/huawei/install/data/d1)
    
  • The cluster state returns to normal

    [[email protected] d1]$ gs_om -t status --detail
    [ Cluster State ]
    cluster_state : Normal
    redistributing : No
    current_az : AZ_ALL
    [ Datanode State ]
    node node_ip instance state | node node_ip instance state
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1 wzsy01 9.1.14.39 6001 /opt/huawei/install/data/d1 P Primary Normal | 2 wzsy02 9.1.14.40 6002 /opt/huawei/install/data/d2 S Standby Normal
    
Please bring the original link to reprint ,thank
Similar articles

2021-11-25

2021-11-25

2021-11-25

2021-11-25

2021-11-25

2021-11-25