一套19C CDB数据库,存储更换HBA卡宕,本文详述这起begin backup导致的故障恢复全过程。
半夜接到客户反馈,一套19C CDB数据库,存储更换HBA卡宕,起不来了,OPEN时提示需要介质恢复,这里截了一段ALERT LOG。
2020-07-28T23:40:53.328908+08:00
Errors in file /u02/app/oracle/diag/rdbms/racdb3/racdb32/trace/racdb32_ora_306493.trc:
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 1: '+DATA/RACDB3/DATAFILE/system.278.1037610503'
2020-07-28T23:40:53.387627+08:00
Errors in file /u02/app/oracle/diag/rdbms/racdb3/racdb32/trace/racdb32_ora_306493.trc:
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 1: '+DATA/RACDB3/DATAFILE/system.278.1037610503'
ORA-10873 signalled during: ALTER DATABASE OPEN /* db agent *//* {0:17:3557} */...
2020-07-28T23:40:55.357177+08:00
License high water mark = 2
2020-07-28T23:40:55.357492+08:00
USER(prelim) (ospid: 310054): terminating the instance
2020-07-28T23:40:56.369307+08:00
Instance terminated by USER(prelim), pid = 310054
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
这里报ORA-10873是由于数据库或表空间BEGIN BACKUP导致,正确的处理方法只需要end backup即可。
alter database end backup;
alter tablespace [tablespace_name] end backup;
alter database open;
[oracle@test ~]$ oerr ora 10873
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = “en_us.utf8”
are supported and installed on your system.
perl: warning: Falling back to the standard locale (“C”).
10873, 00000, “file %s needs to be either taken out of backup mode or media recovered”
// *Cause: An attempt was made to open a database after an instance failure or
// SHUTDOWN ABORT interrupted an online backup.
// *Action: If the indicated file is not a restored backup, then issue the
// ALTER DATABASE END BACKUP command and open the database. If the
// file is a restored online backup, then apply media recovery to
// it and open the database.
当时RECOVER DATABASE 提示找不到归档(需要6-18号的归档)
由于有存储相关操作,误以为其它原因导致的问题,没有关注该报错,查询vdatafile,vdatafile,vdatafile_header发现检查点为上个月的6-18号。
---当前日期为2020-07-28T23:40:56
SQL> col name for a10
SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from v$containers a,v$datafile b where a.con_id=b.con_id order by checkpoint_change#;
CON_ID NAME FILE# RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME STATUS
-------------------- ---------- -------------------- -------------------- -------------------- ------------------- -------
2 PDB$SEED 6 4 2336937 2026-03-16T16:49:32+00:00 ONLINE
2 PDB$SEED 5 1 2336937 2026-03-16T16:49:32+00:00 SYSTEM
2 PDB$SEED 8 9 2336937 2026-03-16T16:49:32+00:00 ONLINE
1 CDB$ROOT 178 178 8041849750453 2026-03-16T16:49:32+00:00 ONLINE
1 CDB$ROOT 7 7 8041849750453 2026-03-16T16:49:32+00:00 ONLINE
1 CDB$ROOT 9 9 8041849750453 2026-03-16T16:49:32+00:00 ONLINE
1 CDB$ROOT 4 4 8041849750453 2026-03-16T16:49:32+00:00 ONLINE
1 CDB$ROOT 3 3 8041849750453 2026-03-16T16:49:32+00:00 ONLINE
1 CDB$ROOT 1 1 8041849750453 2026-03-16T16:49:32+00:00 SYSTEM
接着检查日志及询问客户也没有做restore的操作,误判断为出现了异常,未找到解决办法,因为有最近的全备,做了restore CDB$ROOT的操作,结果悲剧了,还原出来的数据文件,RECOVER仍然需要从6-18的归档开始,查询文件头检查点还是6-18号,接着查询备份中信息,文件的检查点也是6-18号,没遇到过这种情况,以为是ORACLE BUG。
RMAN> list backup of datafile 1;
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
2508 Incr 0 23.19G SBT_TAPE 00:04:26 **2026-03-16T16:49:32+00:00**
BP Key: 2508 Status: AVAILABLE Compressed: NO Tag: HOT_DB_BK_LEVEL0
Handle: bk_3490_1_1046766482 Media: @aaaa6
List of Datafiles in backup set 2508
File LV Type Ckp SCN Ckp Time Abs Fuz SCN Sparse Name
---- -- ---- ---------- ------------------- ----------- ------ ----
1 0 Incr 8041849750453 **2026-03-16T16:49:32+00:00** 8050242429418 NO +DATA/RACDB3/DATAFILE/system.278.1037610503
RMAN>
这里有一个大问题,就是由的数据库较大,restore前没有对当前环境做备份。切记,任何危险的变更操作都需要备份。做到可回退!!!
咨询公司专家后,确定为某此表空间做了begin backup导致。begin backup后文件头上的checkpoint不再更新。但经了解实际没有人为发起过backup backup,alert log中也没有找到begin backup的操作记录,需要再分析。
这时由于之前做了restore cdbroot的操作,控制文件,cdbroot的文件已从备份中还原,导致不能再end backup操作,1个月前的归档已清理,也没办法从6-18开始应用归档。没有办法通过正常的途径恢复数据库,悲剧!!!!
通过v\$backup确定文件处于begin backup热备模式。SQL> select * from v\$backup;
FILE# STATUS CHANGE# TIME CON_ID
-------------------- ------------------ -------------------- ------------------- --------------------
3 NOT ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
4 NOT ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
5 NOT ACTIVE 0 2
6 NOT ACTIVE 0 2
7 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
8 NOT ACTIVE 0 2.......
19 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
20 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
21 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4SQL> select * from v$backup;
FILE# STATUS CHANGE# TIME CON_ID
-------------------- ------------------ -------------------- ------------------- --------------------
3 NOT ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
4 NOT ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
5 NOT ACTIVE 0 2
6 NOT ACTIVE 0 2
7 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
8 NOT ACTIVE 0 2
9 NOT ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 1
15 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
16 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
17 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
18 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
FILE# STATUS CHANGE# TIME CON_ID
-------------------- ------------------ -------------------- ------------------- --------------------
19 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
20 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
21 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
22 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
23 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
24 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
25 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
26 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
27 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
28 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
29 ACTIVE 8041849750453 2026-03-16T16:49:32+00:00 4
如果不是CDB还可以重建控制文件,然后用以下方法解决,但如果是CDB,需要切到pdb中执行以下操作,重建控制文件后,PDB是ORACLE内部的名字,没办法切换PDB,所以以下方法行不通。
alter database datafile 1 offline;
alter database datafile 1 end backup;
alter database datafile 1 online;
最终的解决办法是,bbed修改文件头上的检查点信息,再应用近几天的归档,应用到最新状态,open resetlogs,最终0数据丢失恢复。
这里由于文件比较多,不好全copy到本地文件系统,用到了ASM未公开的内部包,只
读取ASM中的数据头数据块到本地,bbed修改完,再copy回去。
--copy datafile head from asm
Set pagesize 300
Set linesize 300
set numw 20
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
col name for a100
select '@tofs '||b.name||' '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/') from v$datafile b where b.checkpoint_change#<8050778122014 and con_id!=2 order by checkpoint_change#;
@tofs +DATA/RACDB/DATAFILE/system.278.1037610503 /u01/work/system.278.1037610503
@tofs +DATA/RACDB/DATAFILE/sysaux.261.1037610537 /u01/work/sysaux.261.1037610537
@tofs +DATA/RACDB/DATAFILE/undotbs1.288.1037610553 /u01/work/undotbs1.288.1037610553
@tofs +DATA/RACDB/DATAFILE/users.282.1037610553 /u01/work/users.282.1037610553
@tofs +DATA/RACDB/DATAFILE/undotbs2.263.1037611185 /u01/work/undotbs2.263.1037611185
@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/system.302.1038386279 /u01/work/system.302.1038386279
@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/sysaux.292.1038386277 /u01/work/sysaux.292.1038386277
@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/undotbs1.299.1038386279
......
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/sysaux.272.1038492071 /u01/work/sysaux.272.1038492071
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/undotbs1.281.1038492071 /u01/work/undotbs1.281.1038492071
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/undo_2.283.1038492071 /u01/work/undo_2.283.1038492071
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/users.279.1038492071 /u01/work/users.279.1038492071
@tofs +DATA/RACDB/DATAFILE/test11.425.1041690739 /u01/work/test11.425.1041690739
--生成bbed listfile
Set pagesize 300
Set linesize 300
set numw 20
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
col name for a100
select file#||' '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/') from v$datafile b where b.checkpoint_change#<8050778122014 and con_id!=2 order by checkpoint_change#;
BBED> info
File# Name Size(blks)
----- ---- ----------
1 /u01/work/system.278.1037610503 0
3 /u01/work/sysaux.261.1037610537 0
4 /u01/work/undotbs1.288.1037610553 0
7 /u01/work/users.282.1037610553 0
9 /u01/work/undotbs2.263.1037611185 0
15 /u01/work/system.302.1038386279 0
16 /u01/work/sysaux.292.1038386277 0
17 /u01/work/undotbs1.299.1038386279 0
18 /u01/work/undotbs1.303.1038386279 0
19 /u01/work/undotbs2.294.1038386277 0
20 /u01/work/undotbs2.301.1038386279 0
......
178 /u01/work/test11.425.1041690739 0
---bbed 从kcvfhbcp恢复检查点
assign file 1 block 1 kcvfhckp = file 1 block 1 kcvfhbcp
assign file 3 block 1 kcvfhckp = file 3 block 1 kcvfhbcp
assign file 4 block 1 kcvfhckp = file 4 block 1 kcvfhbcp
assign file 7 block 1 kcvfhckp = file 7 block 1 kcvfhbcp
assign file 9 block 1 kcvfhckp = file 9 block 1 kcvfhbcp
assign file 15 block 1 kcvfhckp = file 15 block 1 kcvfhbcp
assign file 16 block 1 kcvfhckp = file 16 block 1 kcvfhbcp
assign file 17 block 1 kcvfhckp = file 17 block 1 kcvfhbcp
assign file 18 block 1 kcvfhckp = file 18 block 1 kcvfhbcp
assign file 19 block 1 kcvfhckp = file 19 block 1 kcvfhbcp
......
assign file 119 block 1 kcvfhckp = file 119 block 1 kcvfhbcp
assign file 178 block 1 kcvfhckp = file 178 block 1 kcvfhbcp
assign file 181 block 1 kcvfhckp = file 181 block 1 kcvfhbcp
sum apply file 1 block 1
sum apply file 3 block 1
sum apply file 4 block 1
sum apply file 7 block 1
sum apply file 9 block 1
sum apply file 15 block 1
sum apply file 16 block 1
sum apply file 17 block 1
sum apply file 18 block 1
sum apply file 19 block 1
......
sum apply file 118 block 1
sum apply file 119 block 1
sum apply file 178 block 1
sum apply file 181 block 1
kcvfhckp 检查点,恢复起始点,begin backup后再不更新
kcvfhbcp begin backup后检查点(begin backup后检查点更新在该位置,end backup以该检查点更新kcvfhckp)
---copy to asm
Set pagesize 300
Set linesize 300
set numw 20
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
col name for a100
select '@toasm '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/')||' '||b.name from v$datafile b where b.checkpoint_change#<8050778122014 and con_id!=2 order by checkpoint_change#;
@toasm /u01/work/system.278.1037610503 +DATA/RACDB/DATAFILE/system.278.1037610503
@toasm /u01/work/sysaux.261.1037610537 +DATA/RACDB/DATAFILE/sysaux.261.1037610537
@toasm /u01/work/undotbs1.288.1037610553 +DATA/RACDB/DATAFILE/undotbs1.288.1037610553
@toasm /u01/work/users.282.1037610553 +DATA/RACDB/DATAFILE/users.282.1037610553
@toasm /u01/work/undotbs2.263.1037611185 +DATA/RACDB/DATAFILE/undotbs2.263.1037611185
@toasm /u01/work/system.302.1038386279 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/system.302.1038386279
@toasm /u01/work/sysaux.292.1038386277 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/sysaux.292.1038386277
@toasm /u01/work/undotbs1.299.1038386279 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/undotbs1.299.1038386279
.......
@toasm /u01/work/test11.425.1041690739 +DATA/RACDB/DATAFILE/test11.425.1041690739
查询v$datafileheader 确认checkpointchange#已更新。
...
✨ 接下来内容请访问原文(https://www.modb.pro/db/28494?YYF)进行查看~
更多数据库相关内容,可访问墨天轮(https://www.modb.pro/?YYF)进行浏览。
