加入收藏
最新动态
最新动态
经典案例

Oacle控制文件损坏导致数据库故障

来源:未知 作者:admin 人气: 时间:2014-02-07

问题:
1、某公司ORACLE故障,造成所有客户端都无法登陆,提示应用服务器配置错误。

分析及连线处理:
1、检查客户端日志,发现数据库异常。
Caused by: java.sql.SQLException: Listener refused the connection with the following error:
ORA-12528, TNS:listener: all appropriate instances are blocking new connections
The Connection descriptor used by the client was:
//10.111.1.112:1521/scmis1
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:260)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:386)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:438)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:164)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:34)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:752)
at java.sql.DriverManager.getConnection(DriverManager.java:525)
at java.sql.DriverManager.getConnection(DriverManager.java:140)
at com.kingdee.bos.sql.shell.KDDriver.connect(Unknown Source)
... 36 more

从错误信息看,数据库无法连接,提示实例不允许建立连接。

2、检查监听状态,显示scmis1实例处于阻塞状态。

3、检查alert日志,发现有异常。
Thread 1 advanced to log sequence 1397
Current log# 1 seq# 1397 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\scmis1\REDO01.LOG
Mon Nov 30 18:44:23 2013
Process startup failed, error stack:
Mon Nov 30 18:44:23 2013
Errors in file d:\oracle\product\10.2.0\admin\eas54\bdump\scmis1_psp0_3504.trc:
ORA-27300: OS 绯荤粺鐩稿叧鎿嶄綔: spcdr:9261:4200 澶辫触, 鐘舵€佷负: 997
ORA-27301: OS 鏁呴殰娑堟伅: 重叠 I/O 操作在进行中。
ORA-27302: 閿欒鍙戠敓鍦? skgpspawn

Mon Nov 30 18:44:24 2013
Process J000 died, see its trace file
Mon Nov 30 18:44:24 2013
kkjcre1p: unable to spawn jobq slave process
Mon Nov 30 18:44:24 2013
Errors in file d:\oracle\product\10.2.0\admin\scmis1\bdump\scmis1_cjq0_3648.trc:


Mon Nov 30 19:00:53 2013
Process startup failed, error stack:
Mon Nov 30 19:00:53 2013
Errors in file d:\oracle\product\10.2.0\admin\scmis1\bdump\scmis1_psp0_3504.trc:
ORA-27300: OS 绯荤粺鐩稿叧鎿嶄綔: spcdr:9261:4200 澶辫触, 鐘舵€佷负: 997
ORA-27301: OS 鏁呴殰娑堟伅: 重叠 I/O 操作在进行中。
ORA-27302: 閿欒鍙戠敓鍦? skgpspawn

从昨天晚上18:44开始,数据库开始报错,由于乱码不知具体原因,但可以看到OS以及I/O字样,所以判断应该是出现了IO错误

4、今天早上有人尝试重启过数据库,但启动没有成功。
Starting up ORACLE RDBMS Version: 10.2.0.1.0.
System parameters with non-default values:
processes = 200
sessions = 225
sga_max_size = 1367343104
__shared_pool_size = 318767104
shared_pool_size = 318767104
__large_pool_size = 8388608
large_pool_size = 8388608
__java_pool_size = 8388608
java_pool_size = 8388608
__streams_pool_size = 0
streams_pool_size = 0
spfile = D:\ORACLE\PRODUCT\10.2.0\DB_1\DBS\SPFILESCMIS1.ORA
nls_language = SIMPLIFIED CHINESE
nls_territory = CHINA
sga_target = 1367343104
control_files = D:\ORACLE\PRODUCT\10.2.0\ORADATA\SCMIS1\CONTROL01.CTL, D:\ORACLE\PRODUCT\10.2.0\ORADATA\SCMIS1\CONTROL02.CTL, D:\ORACLE\PRODUCT\10.2.0\ORADATA\SCMIS1\CONTROL03.CTL
db_block_size = 8192
__db_cache_size = 1023410176
db_cache_size = 838860800
compatible = 10.2.0.1.0
db_file_multiblock_read_count= 16
db_recovery_file_dest = D:\oracle\product\10.2.0/flash_recovery_area
db_recovery_file_dest_size= 2147483648
undo_management = AUTO
undo_tablespace = UNDOTBS1
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=eas54XDB)
job_queue_processes = 10
audit_file_dest = D:\ORACLE\PRODUCT\10.2.0\ADMIN\SCMIS1\ADUMP
background_dump_dest = D:\ORACLE\PRODUCT\10.2.0\ADMIN\SCMIS1\BDUMP
user_dump_dest = D:\ORACLE\PRODUCT\10.2.0\ADMIN\SCMIS1\UDUMP
core_dump_dest = D:\ORACLE\PRODUCT\10.2.0\ADMIN\SCMI1\CDUMP
db_name = scmis1
open_cursors = 300
_b_tree_bitmap_plans = FALSE
_no_or_expansion = TRUE
optimizer_index_cost_adj = 50
optimizer_index_caching = 90
pga_aggregate_target = 203423744
MMAN started with pid=4, OS id=3952
DBW0 started with pid=5, OS id=3968
LGWR started with pid=6, OS id=3976
CKPT started with pid=7, OS id=3992
SMON started with pid=8, OS id=4004
RECO started with pid=9, OS id=4020
CJQ0 started with pid=10, OS id=4044
MMON started with pid=11, OS id=4056
MMNL started with pid=12, OS id=4064
Tue Dec 01 09:21:28 2009
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
PMON started with pid=2, OS id=3916
Tue Dec 01 09:21:29 2009
alter database mount exclusive
ORA-214 signalled during: alter database mount exclusive...
PSP0 started with pid=3, OS id=3948

5、sqlplus连接数据库实例,并检查实例状态。
C:\Documents and Settings\Administrator>set ORACLE_SID=SCMIS1

C:\Documents and Settings\Administrator>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期二 12月 1 10:39:46 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.


连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select instance_name ,status from v$instance;

INSTANCE_NAME STATUS
-------------------------------- ------------------------

scmis1 STARTED

实例状态为STARTED,未启动成功。

6、停止并启动数据库,发现有错误。
SQL> startup
ORACLE instance started.

Total System Global Area 1367343104 bytes
Fixed Size 1250884 bytes
Variable Size 335546812 bytes
Database Buffers 1023410176 bytes
Redo Buffers 7135232 bytes
ORA-00214: control file 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\SCMIS1\CONTROL02.CTL'
version 14018 inconsistent with file
'D:\ORACLE\PRODUCT\10.2.0\ORADATA\SCMIS1\CONTROL01.CTL' version 14016

错误信息表明控制文件出现不一致的情况,估计是昨天晚上的IO故障导致控制文件出现不一致。

CONTROL02.CTL的版本号14018 > CONTROL01.CTL版本号14016

7、将三个控制文件先备份,然后用CONTROL02.CTL覆盖CONTROL01.CTL和CONTROL03.CTL

8、重启数据库,可以正常启动,系统恢复正常。

9、后检查操作系统事件查看器的日志,发现已经被清除,无法获知是否有IO方面的错误信息。


后续建议:
1、请密切关注服务器的IO状况,并检查磁盘是否有异常,并定时检查操作系统相关日志。
2、确保每天的数据备份异地保存,以防本地存储设备故障再次引起ORACLE数据故障,造成数据丢失。

联系我们

【 云 南 IT 服 务 中 心 】
手 机:13759502558
邮 箱:admin@ynits.com
邮 编:650000
地 址:昆明市高新区科华路1-19号山灞大厦主楼9楼
在线联系:马上通过QQ联系我们
        

电话

  • 三九手机网
  • 昆明强川科技
  • 云南蓝队网络
  • 公司地址:昆明市高新区科华路1-19号山灞大厦主楼9楼 邮编:650000
    手机:13759502558 传真:0871-63568770
      Copyright © 2012-2020 ynits.com 云南IT服务中心 版权所有 滇ICP备13000495号-1