management switch plugged into wrong subnet |
cluster unable to fence eachother to regain quorum, and admins unable to access server consoles |
move IPMI interface to CESR SAN Subnet to reduce external dependencies clear labels on switch ports |
port reconfigured for correct subnet |
edit |
name services dependent on functioning cluster |
name resolution delays or failures |
deploy standalone DNS server in PSB |
|
edit |
NIS dependent on functioning cluster |
users, groups, and file systems unavailable |
replace NIS services with LDAP for SL6 clients |
deploy standalone DNS server in PSB |
edit |
NIS client functionality depends on functioning NIS server at boot |
users, groups, and file systems unavailable until ypbind restarted |
replace NIS services with LDAP for SL6 clients |
investigate options for improving ypbind startup procedures and behavior |
edit |
Cluster recovery procedure labor and intellectually expensive |
long recovery time from cluster failures |
upgrade cluster protocols to make full use of the virtual synchrony guarantees of the underlying protocol stack (possibly only in SL7) |
test and produce procedure for quick cluster recovery reproduce failures and submit reports to cluster developers and upstream vendor. |
edit |
CLASSE Kerberos tickets from offsite (Red Rover) depend on clustered VM |
unable to login to CLASSE systems from offsite in certain cases |
open up ports to allow kerberos tickets from any domain controller |
|
edit |
Console access to servers over IPMI depends on web browser and java and network management subnet |
blocked or slow access to server consoles |
enable console redirection on servers to access console over serial in addition to IPMI |
|
edit |
Even number of CLASSE cluster members increases likelyhood of split-brain (dissolved quorum) in certain situations |
cluster services and protocols blocked |
add server to cluster to bring to odd number of members |
|
edit |
Legacy name servers came up with corrupt zones |
name resolution delays or failures |
replace legacy servers consider graceful failover of name services |
|
edit |
DEC Alphas display Windows NT BIOS instead of VMS/Unix BIOS |
can't boot that Alpha |
replace VMS Alphas by Linux cluster members |
replace motherboard battery, then use graphics BIOS to restore BIOS setting |
edit |