- 论坛徽章:
- 0
|
一个双机的T5240系统,现在2号节点起不来,想把1号节点的cluster关掉,手动起数据库和应用,发现共享的磁盘组是降级的,在1号节点没法mount,业务没法恢复,我个人初步分析是磁盘组在2号节点没有释放控制权,1号节点就没法用这个磁盘组,现在想的解决办法是先把1号节点的磁盘组问题解决,把业务先恢复了,再解决2号节点的问题,大神看看有没有好办法,我把日志发一下:
------------------------------------------------------------------
-- 群集节点 --
节点名称 状态
-------- ----
群集节点: JL-ESN-NMS-SVR-DB1-S Online
群集节点: JL-ESN-NMS-SVR-APP1-S Offline
------------------------------------------------------------------
-- 群集传输路径 --
端点 端点 状态
---- ---- ----
传输路径: JL-ESN-NMS-SVR-DB1-S:nxge3 JL-ESN-NMS-SVR-APP1-S:nxge3 faulted
传输路径: JL-ESN-NMS-SVR-DB1-S:nxge2 JL-ESN-NMS-SVR-APP1-S:nxge2 faulted
------------------------------------------------------------------
-- 法定摘要 --
可能的法定选票: 3
所需的法定选票: 2
现有的法定选票: 2
-- 按节点计算的法定选票 --
节点名称 现有的 可能的 状态
-------- ------ ------ ----
节点选票: JL-ESN-NMS-SVR-DB1-S 1 1 Online
节点选票: JL-ESN-NMS-SVR-APP1-S 0 1 Offline
-- 按设备计算的法定选票 --
设备名称 现有的 可能的 状态
-------- ------ ------ ----
设备选票: /dev/did/rdsk/d10s2 1 1 Online
------------------------------------------------------------------
-- 设备组服务器 --
设备组 主 辅助
------ -- ----
设备组服务器: PKGJLJYFX JL-ESN-NMS-SVR-DB1-S -
设备组服务器: PKGJLNMS JL-ESN-NMS-SVR-DB1-S -
-- 设备组状态 --
设备组 状态
------ ----
设备组状态: PKGJLJYFX 已降级
设备组状态: PKGJLNMS 已降级
-- 多所有者设备组 --
设备组 联机状态
------ --------
------------------------------------------------------------------
-- 资源组和资源 --
组名称 资源
------ ----
资源: PKGJLJYFX PKGJLJYFX-IP PKGJLJYFX-FS PKGJLJYFX-ORA PKGJLJYFX-LSNR
资源: PKGJLNMS PKGJLNMS-IP PKGJLNMS-FS PKGJLNMS-ORA PKGJLNMS-LSNR
-- 资源组 --
组名称 节点名称 状况 已暂停
------ -------- ---- ------
组: PKGJLJYFX JL-ESN-NMS-SVR-DB1-S Offline 否
组: PKGJLJYFX JL-ESN-NMS-SVR-APP1-S Offline 否
组: PKGJLNMS JL-ESN-NMS-SVR-APP1-S Offline 否
组: PKGJLNMS JL-ESN-NMS-SVR-DB1-S Offline 否
-- 资源 --
资源名称 节点名称 状况 状态消息
-------- -------- ---- --------
资源: PKGJLJYFX-IP JL-ESN-NMS-SVR-DB1-S Offline Offline - LogicalHostname offline.
资源: PKGJLJYFX-IP JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLJYFX-FS JL-ESN-NMS-SVR-DB1-S Offline Offline
资源: PKGJLJYFX-FS JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLJYFX-ORA JL-ESN-NMS-SVR-DB1-S Offline Offline
资源: PKGJLJYFX-ORA JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLJYFX-LSNR JL-ESN-NMS-SVR-DB1-S Offline Offline
资源: PKGJLJYFX-LSNR JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLNMS-IP JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLNMS-IP JL-ESN-NMS-SVR-DB1-S Offline Offline - LogicalHostname offline.
资源: PKGJLNMS-FS JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLNMS-FS JL-ESN-NMS-SVR-DB1-S Offline Offline
资源: PKGJLNMS-ORA JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLNMS-ORA JL-ESN-NMS-SVR-DB1-S Offline Offline
资源: PKGJLNMS-LSNR JL-ESN-NMS-SVR-APP1-S Offline Offline
资源: PKGJLNMS-LSNR JL-ESN-NMS-SVR-DB1-S Offline Offline
------------------------------------------------------------------
-- IPMP 组 --
节点名称 组 状态 适配器 状态
-------- -- ---- ------ ----
IPMP 组: JL-ESN-NMS-SVR-DB1-S sc_ipmp0 Online nxge1 Online
IPMP 组: JL-ESN-NMS-SVR-DB1-S sc_ipmp0 Online nxge0 Online
-- 区域中的 IPMP 组 --
区域名称 组 状态 适配器 状态
-------- -- ---- ------ ----
------------------------------------------------------------------
more /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/md/dsk/d4 - - swap - no -
/dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no -
/dev/md/dsk/d5 /dev/md/rdsk/d5 /var ufs 1 no -
#/dev/md/dsk/d6 /dev/md/rdsk/d6 /globaldevices ufs 2 yes -
/dev/md/dsk/d3 /dev/md/rdsk/d3 /opt ufs 2 yes -
/dev/md/dsk/d1 /dev/md/rdsk/d1 /temp ufs 2 yes -
/devices - /devices devfs - no -
sharefs - /etc/dfs/sharetab sharefs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -
/dev/md/dsk/d6 /dev/md/rdsk/d6 /global/.devices/node@1 ufs 2 no global
/dev/md/PKGJLJYFX/dsk/d200 /dev/md/PKGJLJYFX/rdsk/d200 /opt/JFAPP ufs - no logging
/dev/md/PKGJLJYFX/dsk/d201 /dev/md/PKGJLJYFX/rdsk/d201 /opt/JFDB ufs - no logging
/dev/md/PKGJLNMS/dsk/d202 /dev/md/PKGJLNMS/rdsk/d202 /opt/NMSDB ufs - no logging
/dev/md/PKGJLNMS/dsk/d203 /dev/md/PKGJLNMS/rdsk/d203 /opt/NMSAPP ufs - no logging
/dev/md/PKGJLJYFX/dsk/d300 /dev/md/PKGJLJYFX/rdsk/d300 /opt/JFDB_BK ufs - no logging
/dev/md/PKGJLNMS/dsk/d400 /dev/md/PKGJLNMS/rdsk/d400 /opt/NMSDB_BK ufs - no logging
ufs - no logging /dev/md/PKGJLNMS/rdsk/d401 /opt/APP_BK --还有--(98%)
format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> solaris
/pci@400/pci@0/pci@8/scsi@0/sd@0,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> solaris
/pci@400/pci@0/pci@8/scsi@0/sd@1,0
2. c4t600A0B800033804C000001ED49E6B4C3d0 <SUN-CSM200_R-0710 cyl 38398 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800033804c000001ed49e6b4c3
3. c4t600A0B800033804C000001F049E6B518d0 <SUN-CSM200_R-0710 cyl 63998 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800033804c000001f049e6b518
4. c4t600A0B800033804C000001F249E6B59Cd0 <SUN-CSM200_R-0710 cyl 63998 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800033804c000001f249e6b59c
5. c4t600A0B800033804C000001F449E6B603d0 <SUN-CSM200_R-0710 cyl 63998 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800033804c000001f449e6b603
6. c4t600A0B800033804C000001F649E6B642d0 <SUN-CSM200_R-0710 cyl 63998 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800033804c000001f649e6b642
7. c4t600A0B800033804C000001F849E6B695d0 <SUN-CSM200_R-0710 cyl 38398 alt 2 hd 128 sec 64>
/scsi_vhci/ssd@g600a0b800033804c000001f849e6b695
8. c4t600A0B800033804C0000020B49E79B9Fd0 <SUN-CSM200_R-0710 cyl 41598 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b800033804c0000020b49e79b9f
9. c4t600A0B800033804C0000020449E79A3Fd0 <SUN-CSM200_R-0710 cyl 51198 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b800033804c0000020449e79a3f
10. c4t600A0B800033804C0000020749E79A84d0 <SUN-CSM200_R-0710 cyl 41598 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b800033804c0000020749e79a84
11. c4t600A0B800033804C0000020949E79AD1d0 <SUN-CSM200_R-0710 cyl 63998 alt 2 hd 256 sec 64>
/scsi_vhci/ssd@g600a0b800033804c0000020949e79ad1
metastat -s PKGJLJYFX
PKGJLJYFX/d300: Concat/Stripe
大小: 1677623296 块 (799 GB)
条 0:
设备 引导块 Dbase Reloc
d9s0 0 否 否
PKGJLJYFX/d201: Concat/Stripe
大小: 2097053696 块 (999 GB)
条状 0:(交错: 32 块)
设备 引导块 Dbase Reloc
d12s0 0 否 否
d11s0 0 否 否
PKGJLJYFX/d200: Concat/Stripe
大小: 629096448 块 (299 GB)
条 0:
设备 引导块 Dbase Reloc
d13s0 0 否 否
Jul 3 15:11:11 JL-ESN-NMS-SVR-DB1-S Cluster.CCR: symlink failed for /dev/md/shared/1/rdsk/d300 -> ../../../../../devices/pseudo/md@0:1,300,raw: I/O错误
Jul 3 15:11:11 JL-ESN-NMS-SVR-DB1-S Cluster.CCR: symlink failed for /dev/md/shared/1/dsk/d300 -> ../../../../../devices/pseudo/md@0:1,300,blk: I/O错误
在这之后进行了一次重启
metastat -s PKGJLJYFX
PKGJLJYFX/d300: Concat/Stripe
大小: 1677623296 块 (799 GB)
条 0:
设备 引导块 Dbase Reloc
d9s0 0 否 否
PKGJLJYFX/d201: Concat/Stripe
大小: 2097053696 块 (999 GB)
条状 0:(交错: 32 块)
设备 引导块 Dbase Reloc
d12s0 0 否 否
d11s0 0 否 否
PKGJLJYFX/d200: Concat/Stripe
大小: 629096448 块 (299 GB)
条 0:
设备 引导块 Dbase Reloc
d13s0 0 否 否
Jul 3 15:41:54 JL-ESN-NMS-SVR-DB1-S Cluster.CCR: mkdir failed for /dev/md/shared/1/rdsk 0x1ed: 无此文件或目录
Jul 3 15:41:54 JL-ESN-NMS-SVR-DB1-S last message repeated 1 time
Jul 3 15:41:54 JL-ESN-NMS-SVR-DB1-S Cluster.CCR: cannot create link: /dev/md/shared/1/rdsk/d300 -> ../../../../../devices/pseudo/md@0:1,300,raw. max attempts exceeded
Jul 3 15:41:54 JL-ESN-NMS-SVR-DB1-S Cluster.CCR: mkdir failed for /dev/md/shared/1/dsk 0x1ed: 无此文件或目录
Jul 3 15:41:54 JL-ESN-NMS-SVR-DB1-S last message repeated 1 time
Jul 3 15:41:54 JL-ESN-NMS-SVR-DB1-S Cluster.CCR: cannot create link: /dev/md/shared/1/dsk/d300 -> ../../../../../devices/pseudo/md@0:1,300,blk. max attempts exceeded
mount /dev/md/PKGJLJYFX/dsk/d300 /opt/JFDB_BK
1mount: /dev/md/PKGJLJYFX/dsk/d300 或者 /opt/JFDB_BK, 没有这样的文件或者目录
root@JL-ESN-NMS-SVR-DB1-S # cd /dev/md
root@JL-ESN-NMS-SVR-DB1-S # ls
PKGJLJYFX PKGJLNMS admin dsk rdsk shared
root@JL-ESN-NMS-SVR-DB1-S # cd PKGL
bash: cd: PKGL: 无此文件或目录
root@JL-ESN-NMS-SVR-DB1-S # cd PKGJLJYFX
bash: cd: PKGJLJYFX: 无此文件或目录
root@JL-ESN-NMS-SVR-DB1-S # ls -al
总数 24
drwxr-xr-x 4 root root 512 7月 3日 15:37 .
drwxr-xr-x 20 root sys 4608 7月 3日 16:15 ..
lrwxrwxrwx 1 root root 8 7月 3日 15:37 PKGJLJYFX -> shared/1
lrwxrwxrwx 1 root other 8 2009 4月 17 PKGJLNMS -> shared/2
lrwxrwxrwx 1 root root 31 2009 4月 14 admin -> ../../devices/pseudo/md@0:admin
drwxr-xr-x 2 root root 512 2009 4月 17 dsk
drwxr-xr-x 2 root root 512 2009 4月 17 rdsk
lrwxrwxrwx 1 root root 42 2009 4月 15 shared -> ../../global/.devices/node@2/dev/md/shared
以下是节点2的启动信息
T5240, No Keyboard
Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.33.0.b, 32544 MB memory available, Serial #86443024.
Ethernet address 0:21:28:27:4:10, Host ID: 85270410.
Boot device: /pci@400/pci@0/pci@8/scsi@0/disk@0,0:a File and args: -a
Name of system file [/etc/system]:
SunOS Release 5.10 Version Generic_138888-03 64-bit
Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
root filesystem type [ufs]:
Enter physical name of root device
[/pci@400/pci@0/pci@8/scsi@0/disk@0,0:a]:
Hostname: unknown
WARNING: The following files in / differ from the boot archive:
new /platform/sun4v/lib/fs/hsfs/bootblk
new /platform/sun4v/lib/fs/ufs/bootblk
new /platform/sun4v/lib/fs/zfs/bootblk
new /platform/sun4v/lib/libc_psr/libc_psr_hwcap1.so.1
new /platform/sun4v/lib/libc_psr/libc_psr_hwcap2.so.1
new /platform/sun4v/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
new /platform/sun4v/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
new /platform/sun4v/lib/sparcv9/libc_psr.so.1
new /platform/sun4v/lib/sparcv9/libmd_psr.so.1
new /platform/sun4v/lib/libc_psr.so.1
new /platform/sun4v/lib/libmd_psr.so.1
new /platform/sun4v/bootlst
new /platform/sun4v/wanboot
new /platform/sun4v/failsafe
new /platform/sun4v/boot_archive
new /platform/sun4u/kernel/crypto/sparcv9/arcfour2048
new /platform/sun4u/kernel/kmdb/sparcv9/sgenv
new /platform/sun4u/kernel/kmdb/sparcv9/sgsbbc
new /platform/sun4u/kernel/kmdb/sparcv9/unix
new /platform/sun4u/kernel/kmdb/sparcv9/wrsm
new /platform/sun4u/kernel/kmdb/sparcv9/wrsmd
new /platform/sun4u/kernel/brand/sparcv9/s8_brand
new /platform/sun4u/kernel/brand/sparcv9/s9_brand
new /platform/sun4u/failsafe
new /platform/sun4us/failsafe
The recommended action is to reboot to the failsafe archive to correct
the above inconsistency. To accomplish this, on a GRUB-based platform,
reboot and select the "Solaris failsafe" option from the boot menu.
On an OBP-based platform, reboot then type "boot -F failsafe". Then
follow the prompts to update the boot archive. Alternately, to continue
booting at your own risk, you may clear the service by running:
"svcadm clear system/boot-archive"
Jul 3 17:25:10 svc.startd[8]: svc:/system/boot-archive:default: Method "/lib/svc/method/boot-archive" failed with exit status 95.
Jul 3 17:25:10 svc.startd[8]: system/boot-archive:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run
Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
大家看看有什么办法,谢谢各位大神!!! |
|