- 论坛徽章:
- 2
|
本帖最后由 rand1985 于 2013-12-27 18:09 编辑
环境:
master server: aix 6.1
client: windows 2008 R2(是集群)
NBU:7.1.0.2
master 与client之间有防火墙,NAT
备份文件,文件总量有510GB,文件数量有130万左右
在备份过程中,备份报错:13、24、42、58
其中多以24,42出现
测试后发现,小的文件夹可以备份完成,但是大的文件夹无法完成备份。报错24,42,13。
client 端的bkbkar的日志(部分):
15:59:39.069: [5992.7116] <2> TransporterRemote::write[2](): DBG - | An Exception of type [SocketWriteException] has occured at: | Module: @(#) $Source: src/ncf/tfi/lib/TransporterRemote.cpp,v $ $Revision: 1.54 $ , Function: TransporterRemote::write[2](), Line: 321 | Local Address: [::]:0 | Remote Address: [::]:0 | OS Error: 10054 (远程主机强迫关闭了一个现有的连接。
) | Expected bytes: 16384 | (../TransporterRemote.cpp:321)
15:59:39.069: [5992.7116] <16> tar_tfi::processException:
An Exception of type [SocketWriteException] has occured at:
Module: @(#) $Source: src/ncf/tfi/lib/TransporterRemote.cpp,v $ $Revision: 1.54 $ , Function: TransporterRemote::write[2](), Line: 321
Module: @(#) $Source: src/ncf/tfi/lib/Packer.cpp,v $ $Revision: 1.89 $ , Function: Packer::getBuffer(), Line: 656
Module: tar_tfi::getBuffer, Function: H:\71\src\cl\clientpc\util\tar_tfi.cpp, Line: 312
Local Address: [::]:0
Remote Address: [::]:0
OS Error: 10054 (远程主机强迫关闭了一个现有的连接。
)
Expected bytes: 16384
作业详情:
2013-12-25 16:08:58 - Info nbjm (pid=7733264) starting backup job (jobid=59994) for client hzcmbdfs, policy hzcmbdfs, schedule hzcmbdfs-auto
2013-12-25 16:08:58 - Info nbjm (pid=7733264) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=59994, request id:{CEB1A308-6D3B-11E3-9066-759916130000})
2013-12-25 16:08:58 - requesting resource hz_nbu-hcart2-robot-tld-0
2013-12-25 16:08:58 - requesting resource hz_nbu.NBU_CLIENT.MAXJOBS.hzcmbdfs
2013-12-25 16:08:58 - requesting resource hz_nbu.NBU_POLICY.MAXJOBS.hzcmbdfs
2013-12-25 16:08:59 - granted resource hz_nbu.NBU_CLIENT.MAXJOBS.hzcmbdfs
2013-12-25 16:08:59 - granted resource hz_nbu.NBU_POLICY.MAXJOBS.hzcmbdfs
2013-12-25 16:08:59 - granted resource HZ0019
2013-12-25 16:08:59 - granted resource HP.ULTRIUM5-SCSI.000
2013-12-25 16:08:59 - granted resource hz_nbu-hcart2-robot-tld-0
2013-12-25 16:08:59 - estimated 505568104 kbytes needed
2013-12-25 16:08:59 - Info nbjm (pid=7733264) started backup job for client hzcmbdfs, policy hzcmbdfs, schedule hzcmbdfs-auto on storage unit hz_nbu-hcart2-robot-tld-0
2013-12-25 16:08:59 - started process bpbrm (pid=8061202)
2013-12-25 16:09:04 - Info bpbrm (pid=8061202) hzcmbdfs is the host to backup data from
2013-12-25 16:09:04 - Info bpbrm (pid=8061202) reading file list from client
2013-12-25 16:09:04 - connecting
2013-12-25 16:09:07 - Info bpbrm (pid=8061202) starting bpbkar on client
2013-12-25 16:09:07 - connected; connect time: 0:00:00
2013-12-25 16:09:09 - Info bpbkar (pid=4520) Backup started
2013-12-25 16:09:09 - Info bpbrm (pid=8061202) bptm pid: 6422646
2013-12-25 16:09:09 - Info bptm (pid=6422646) start
2013-12-25 16:09:09 - Info bptm (pid=6422646) using 65536 data buffer size
2013-12-25 16:09:09 - Info bptm (pid=6422646) using 30 data buffers
2013-12-25 16:09:09 - Info bptm (pid=6422646) start backup
2013-12-25 16:09:09 - Info bptm (pid=6422646) backup child process is pid 8323386
2013-12-25 16:09:09 - Info bptm (pid=6422646) Waiting for mount of media id HZ0019 (copy 1) on server hz_nbu.
2013-12-25 16:09:09 - mounting HZ0019
2013-12-25 16:09:56 - Info bptm (pid=6422646) media id HZ0019 mounted on drive index 0, drivepath /dev/rmt0.1, drivename HP.ULTRIUM5-SCSI.000, copy 1
2013-12-25 16:09:56 - mounted HZ0019; mount time: 0:00:47
2013-12-25 16:09:56 - positioning HZ0019 to file 31
2013-12-25 16:11:38 - positioned HZ0019; position time: 0:01:42
2013-12-25 16:11:38 - begin writing
2013-12-25 16:22:04 - Error bptm (pid=8323386) system call failed - A connection with a remote socket was reset by that socket. (at child.c.1295)
2013-12-25 16:22:04 - Error bptm (pid=8323386) unable to perform read from client socket, connection may have been broken
2013-12-25 16:22:04 - Critical bpbrm (pid=8061202) from client hzcmbdfs: FTL - socket write failed
2013-12-25 16:22:46 - Info bptm (pid=6422646) EXITING with status 42 <----------
2013-12-25 16:22:46 - Error bpbrm (pid=8061202) could not send server status message
2013-12-25 16:22:48 - Info bpbkar (pid=4520) done. status: 42: network read failed
2013-12-25 16:22:48 - end writing; write time: 0:11:10
network read failed (42)
******
******
2013-12-25 15:02:50 - Info nbjm (pid=7733264) starting backup job (jobid=59993) for client hzcmbdfs, policy hzcmbdfs, schedule hzcmbdfs-auto
2013-12-25 15:02:50 - Info nbjm (pid=7733264) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=59993, request id:{91600EBC-6D32-11E3-9069-61546DD60000})
2013-12-25 15:02:50 - requesting resource hz_nbu-hcart2-robot-tld-0
2013-12-25 15:02:50 - requesting resource hz_nbu.NBU_CLIENT.MAXJOBS.hzcmbdfs
2013-12-25 15:02:50 - requesting resource hz_nbu.NBU_POLICY.MAXJOBS.hzcmbdfs
2013-12-25 15:02:50 - granted resource hz_nbu.NBU_CLIENT.MAXJOBS.hzcmbdfs
2013-12-25 15:02:50 - granted resource hz_nbu.NBU_POLICY.MAXJOBS.hzcmbdfs
2013-12-25 15:02:50 - granted resource HZ0019
2013-12-25 15:02:50 - granted resource HP.ULTRIUM5-SCSI.003
2013-12-25 15:02:50 - granted resource hz_nbu-hcart2-robot-tld-0
2013-12-25 15:02:50 - estimated 505568104 kbytes needed
2013-12-25 15:02:50 - Info nbjm (pid=7733264) started backup job for client hzcmbdfs, policy hzcmbdfs, schedule hzcmbdfs-auto on storage unit hz_nbu-hcart2-robot-tld-0
2013-12-25 15:02:50 - started process bpbrm (pid=9044320)
2013-12-25 15:02:55 - Info bpbrm (pid=9044320) hzcmbdfs is the host to backup data from
2013-12-25 15:02:55 - Info bpbrm (pid=9044320) reading file list from client
2013-12-25 15:02:55 - connecting
2013-12-25 15:03:05 - Info bpbrm (pid=9044320) starting bpbkar on client
2013-12-25 15:03:05 - connected; connect time: 0:00:00
2013-12-25 15:03:07 - Info bpbkar (pid=2656) Backup started
2013-12-25 15:03:07 - Info bpbrm (pid=9044320) bptm pid: 9175096
2013-12-25 15:03:08 - Info bptm (pid=9175096) start
2013-12-25 15:03:08 - Info bptm (pid=9175096) using 65536 data buffer size
2013-12-25 15:03:08 - Info bptm (pid=9175096) using 30 data buffers
2013-12-25 15:03:08 - Info bptm (pid=9175096) start backup
2013-12-25 15:03:08 - Info bptm (pid=9175096) backup child process is pid 8978612
2013-12-25 15:03:08 - Info bptm (pid=9175096) Waiting for mount of media id HZ0019 (copy 1) on server hz_nbu.
2013-12-25 15:03:08 - mounting HZ0019
2013-12-25 15:04:10 - Info bptm (pid=9175096) media id HZ0019 mounted on drive index 4, drivepath /dev/rmt3.1, drivename HP.ULTRIUM5-SCSI.003, copy 1
2013-12-25 15:04:10 - mounted HZ0019; mount time: 0:01:02
2013-12-25 15:04:10 - positioning HZ0019 to file 31
2013-12-25 15:05:50 - positioned HZ0019; position time: 0:01:40
2013-12-25 15:05:50 - begin writing
2013-12-25 15:21:04 - Error bptm (pid=8978612) system call failed - A connection with a remote socket was reset by that socket. (at child.c.1295)
2013-12-25 15:21:04 - Critical bpbrm (pid=9044320) from client hzcmbdfs: FTL - socket write failed
2013-12-25 15:21:04 - Error bptm (pid=8978612) unable to perform read from client socket, connection may have been broken
2013-12-25 15:21:04 - Error bptm (pid=9175096) media manager terminated by parent process
2013-12-25 15:22:03 - Error bpbrm (pid=9044320) could not send server status message
2013-12-25 15:22:05 - Info bpbkar (pid=2656) done. status: 24: socket write failed
2013-12-25 15:22:05 - end writing; write time: 0:16:15
socket write failed (24)
请教!!!
|
|