Tags

CLASSE Network and System Status


This page contains information on issues currently under investigation. If you know of or are experiencing any other problems, please contact the Computer Group.

Please see the page NetworkStatusArchive2 for a record of previous years' status messages.


Tuesday December 18th, 2018

The central VM cluster had a failure. Various lab services depending on VMs are down.
  • 8AM - notices generated:
    • Various IT services are hung that run in our central VM cluster, including the lab print server. Printing is down.
    • Additional services that may be affected are WinApp2 - CLASSE Windows terminal server. RSLogixApp etc.
    • Remote access via Screenconnect.
  • 8:49AM - services restored. VM Cluster is back up.

Friday November 16th, 2018

The VPN certificate authority certificate expired and had to be reissued. VPN users will need to update their installation with the new CA.pem. For user-managed systems, instructions and the CA.pem can be found on the CLASSE OpenVPN page.

Thursday March 5th, 2018

  • ~10:30AM - CLASSE Central VM cluster begins having issues with rebooting central virtual machines.
  • ~11:15AM - Current investigation shows that VMs come back once the guest OS reboots. Currently the reboots appear to be clean shut downs and reboots and the only affect is random outages as the reboot occurs. We are continuing the investigation.
  • 11:17AM - Potential resolution identified and test fix application started.
  • 11:40AM - Fix applied and now starting underlying Host updates. There will still be pauses to VMs as they are migrated.
  • 12:22PM - Fix seems to be going smoothly and VMs seem stable. We do not expect any more user facing issues.

Sunday, 5 AM, January 7, 2018

Between 5AM and 8AM on Sunday, January 7, 2018, connectivity to the outside world and among CLASSE buildings (Wilson, Newman, PSB, Annex) will be intermittently unavailable. Cornell's central network service organization will be upgrading firmware in all of their "edge" switches.

Friday, 9 AM, November 17, 2017

The server for /nfs/acc/libs, acc101, failed on Thursday, with the computer group notified Friday morning. We have migrated the CESR library disk contents to the high-availability CLASSE cluster. Systems with /nfs/acc/libs still mounted from the failed hardware may need to be rebooted to pick up the new location.

Wednesday, 3:49 PM, November 15, 2017

Incoming telephone calls to Cornell are not working.
  • Both AT&T and Verizon have confirmed that this issue is part of a larger outage beyond Cornell. They have not indicated when it will be fixed.

Thursday, 12:20 PM, 16-Nov-2017

Here are some workarounds suggested by CIT:

  • Emergency: 911 or use a Blue Light phone
  • Cornell Police: For any other police service or urgent campus matter. From a campus phone, call 607-255-1111. From a cell or off-campus phone, 607-257-9296 or 607-257-9297.
  • Cornell Health - For routine appointments, schedule online http://mycornellhealth.health.cornell.edu/ or wait to call when the outage has been corrected.
  • Cornell Health - For urgent needs, from a campus phone, 607-255-5155 during office hours. From a cell or off-campus phone (and from all phones after hours), 607-330-5037. Cornell Health’s answering service will take your name and number, and a Cornell Health staff person will call you back as soon as possible.
  • Vet hospitals: 607-253-3060. If that number doesn't work, 607-342-0460 or vet-hosp@cornell.edu
  • IPP Customer Service: From a campus phone, 607-255-5322. From a cell or off-campus phone, 607-342-7387.
  • Campus Life (Dining, Housing, Community Center Operations, and Conference Services): From a campus phone, 607-255-5368. From a cell or off-campus phone, 607-793-1508 or 607-793-1332.
  • Other Cornell groups: Visit their website to see if they’ve posted other ways to contact them.

  • Two-Step Login: If you normally use your campus phone as the verification step, switch to the backup method you set up. The IT Service Desk https://it.cornell.edu/support can help if you don't have a backup method.
  • Zoom meeting: Distribute the phone number that Zoom provides and have everyone call in that way.

Added 16:20, 16-Nov-2017:

As of 16:46, 16-Nov-1017:
  • Verizon reports that they are working to repair a fiber cut. Their estimated time to repair is within the early morning hours of Friday, 11/17.

As of 4:37 AM, 17-Nov-1017:
  • Service restored. Verizon has confirmed that service has been restored. Cornell voice engineers have verified that special forwarding that had been put in place for a few critical phone lines has been removed and those numbers are back to working as usual.

Tuesday, October 24, 2017

A campus-wide power outage at about 7:15 AM EDT took down CLASSE central computer services. All central services were restored befpre 1PM. One of the nodes in the central CLASSE cluster is still down, apparently due to a hardware failure. Most of the compute-farm computers are up, too.
  • If you're calling from a non-Cornell number (including cell phones), visit the website for the Cornell group you're trying to reach, to see if they have posted alternative ways to reach them.
  • Call 911 or use a Blue Light phone in case of an emergency.
  • Call Cornell Police direct lines (607-257-9296 or 607-257-9297) for any other police service or urgent campus matter.
  • To reach Cornell Health, call its answering service (607-330-5037).
  • For Two-Step Login, if you normally use your campus phone as the verification step, switch to the backup method you set up. The IT Service Desk https://it.cornell.edu/support can help if you don't have a backup method.
  • For Zoom meetings, distribute the phone number that Zoom provides and have everyone call in that way.

Current Status: AT&T has confirmed that this issue is part of a larger outage beyond Cornell. AT&T has not indicated when it will be fixed.

Tuesday, August 1, 2017

We will be performing network maintenance on Tuesday 01-Aug-2017 from 9 AM to noon. During this window, there will be 20-30 minute interruptions on various segments of our network (e.g. Public, ERL, CHESS-DAQ), which would affect access to central filesystems (a.k.a. samba), web browsing, email, CLASSE/CHESS/CBB website availability, and printing/scanning. Please contact us with any concerns.

Wednesday June 21st, 2017

  • 1:30PM - A network outage affecting some internal networks was discovered. Investigation started.
  • 1:50PM - the network outage started affecting additional networks.
  • 2:05PM - Proximate cause of outage discovered. Remediation steps discussed.
  • 2:12PM - Remediation steps implemented.
  • 2:16PM - Network outage resolved.

Thursday, April 27th, 2017

  • 8AM - We are currently investigating severe network filesystem delays - this includes some SAMBA folders.
  • 9:20AM - Services restored. SAMBA and the other network filesystems are back to normal.

Sunday, April 23, 2017: campus-wide power outage

  • Power to CLASSE buildings was lost briefly (for approx. 2 minutes) around 9:20 AM on Sunday, April 23, 2017.
  • Core CLASSE-IT servers did not lose power (because of UPSs), and all CLASSE computer services were up by 11:40 AM.

Friday, March 24th, 2017: Vault outage

  • At 8:07AM, we were notified of a Vault outage. Investigation started.
  • At 8:20AM, we confirmed an outage, and began diagnostics and plan for resolution of outage.
  • At 8:39AM, we determined the cause of the outage and began working on a recovery method.
  • At 9:15AM, we have our Adraft consultant working on the Vault server actively doing recovery.
  • As of 11:11AM the Vault service is validated as restored.

Tuesday, February 21, 2017 network and Web glitches

  • We will be reconfiguring core CLASSE switches and restarting several services between 10AM and Noon on Tuesday.
    • Unfortunately, the interruptions were more extensive than expected. We hope services will be back to normal soon.

Tuesday, February 14, 2017: network and Web outages

  • We will be updating and restarting a core 1Gbit CLASSE switch at 10AM. This will cause general network glitches for many CLASSE Public, Protected, and other connections to desktops and laptops. We hope to complete the work within 1 hour, but it may take up to 4 hours in the worst case.
  • At approximately 10am Tuesday, the main CLASSE, CHESS, CBB, and Xraise websites will be rebooted, as will the CLASSE wiki, SVN, and ELOG servers. Each site should only be down for a few seconds

Sunday, February 5, 2017: power outage recovery

  • 12:38 PM. At this point, we have resolved all major problems with CLASSE services that we are aware of. Please submit a ticket with any remaining issues: CLASSE service request

Saturday, February 4, 2017: city-wide power outage

  • Power was lost to all CLASSE computer systems shortly after 5:10 PM, Saturday, February 4, 2017.
  • Power was restored to the Lab at approximately 7:05 PM.
  • Most CLASSE computer services were up by Midnight, but some systems are still down.

Wednesday, September 28, 2016: Intermittent glitches

  • All CLASSE networks again will have intermittent glitches between Noon and 4PM while additional central network devices are replaced.

Tuesday, September 20, 2016: Intermittent glitches

  • All CLASSE networks will have intermittent glitches between 10AM and 2PM while central network switches are replaced.

Thursday, June 9, 2016: Power outage in LT107 at 5:15 PM

  • A circuit breaker tripped at about 5:15 PM in the server room in the East Module (LT107). Network switches and various servers lost power. Computer Farm members are unavailable.
  • Power was restored by about 7:30 PM.

Monday, January 4, 2016: Reboot LNX201 at 11AM

  • Out of scheduled maintenance reboot is needed.
  • As of 11:29AM LNX201 is available.

Wednesday, October 21,2015: Password Self Service is down

  • PWM, the password self service app is currently down due to back end security configuration issues. Time to recovery is currently unknown. For now, please stop by the helpdesk to reset your password, or call the computer group.
  • As of 4:38PM, PWM Password Self Service is available again.

Tuesday, September 15: Brief network outage at Noon

  • Scheduled network switch reconfiguration at Noon (outage expected to be very brief)
  • Network available again at 12:30, much later than anticipated.

Thursday, August 27, 2015, CLASSE VPN service Outage

  • 9:21AM: The outage is resolved. CLASSE VPN service is now restored.
  • Starting at around 8AM the VPN service is experiencing an issue completing connections. We are currently investigating, but currently have no ETA for restoration of service. For now, please plug in to LNS Protected cables or use alternate connection methods, including https://outlook.cornell.edu for e-mail access.

Tuesday, August 11, 2015, Partial Power Outage

  • From approximately 8 AM EDT until Noon on Tuesday, August 11, 2015, there will be a power outage in the Center and East Modules, including the trailer server room. To minimize the number of disruptions to our users, we will take this opportunity to perform some planned infrastructure upgrades and maintenance. This includes the migration of several servers and services from EM107 to the newer server rooms in Wilson and the Physical Sciences Building. For details see https://wiki.classe.cornell.edu/Computing/NewsletterAugust032015

  • We believe that all services were up again by 11AM. Please send email to service-classe@cornell.edu if you find any that we overlooked.

Tuesday, April 28, 2015, CLASSE Cluster maintenance

  • Starting at approximately Noon on Tuesday 4/28, the CLASSE Cluster was down for routine maintenance. Many networked file systems accessed over NFS and Samba, including most unix / linux home directories, were unavailable until around 3pm. In addition, many services and virtual machines were unavailable.

Tuesday, February 24, 2015

  • 10:34 AM: A power outage in LOE (required for demolition work) took down the LOE wired and wireless networks
  • 12:14 PM: power was restored

Thursday, December 18, 2014

  • The CESR online cluster is down for scheduled file system maintenance starting at 10AM. We hope it will be up by 2PM.
  • As a side-effect LNX201 and the Samba file servers have had to be restarted.
  • CESR cluster maintenance was completed at 15:30. (3:30 PM)

Thursday, December 11, 2014

Tuesday, December 9, 2014

Tuesday, October 14th, 2014

Wednesday, September 24th, 2014

  • 3:55PM - we are having network issues with the CLASSE Public network to Newman Lab and the PSB. We are investigating.
  • 4:23PM - the network issue was resolved; the CLASSE Public network VLAN has been restored to the PSB and Newman.

Tuesday, August 12th, 2014

  • 8:10 AM - Matlab license server outage. New instances of Matlab will not start currently. We are investigating.
  • 8:23 AM - License server restarted, outage is now resolved. Matlab is available as expected.

Wednesday, August 6th, 2014

  • We are still having an outage of the CLASSE Proxy server. This will affect Outlook and Firefox on CLASSE managed Windows computers, and Firefox on other computers using the proxy.
  • Proxy failure resolved at 8:47AM. You may need to restart Outlook and / or Firefox to reconnect.

Tuesday, August 5th, 2014

We had a failure of the CLASSE cluster. Many services were impacted. Service restored around 8PM.

Tuesday, June 17, 2014

  • We had a failure at 2AM of the Liebert Air Conditioner which provides half of the cooling in the Lab's central computer room, Wilson 221. The room is getting quite warm, but fans bringing in hallway air seem to be managing to keep the room from getting too hot.

If it does get too hot, we'll have to shut down some of the central server computers.

Hopefully someone will be here shortly after 6AM to work on the Air Conditioner.

  • The Leibert A/C was restored to operation shortly after 8AM
  • Unfortunately, the repair failed. They are continuing to work on it. Loud fans are blowing air into the hallway.

Wednesday, April 9, 2014

Cornell's IT Security Office has issued alerts about the recently discovered "Heartbleed" vulnerability in recent versions of the commonly used encryption protocol, OpenSSL (http://www.it.cornell.edu/services/alert.cfm?id=3141 and http://www.it.cornell.edu/services/alert.cfm?id=3139). As of 11:00 AM on April 8, all of the public-facing services at CLASSE were updated to address this vulnerability. We constantly monitor our internet traffic and are actively performing security audits and scans of our network and systems. At this time, we do not believe it necessary for CLASSE personnel to change their CLASSE passwords. If we receive new information or recommendations that require action from CLASSE personnel, we will issue a revised statement.

Tuesday, April 8, 2014

  • There will be brief outage in network connectivity between Wilson Lab and the rest of campus, starting at 9:00 AM on Tuesday, April 8, and expected to last up to 30 minutes. Note that wireless service via RedRover/eduroam will also be unavailable.

The purpose of the outage is for CIT to activate a new high-speed connection between Wilson and CCC, which will eventually provide 10-gigabit network connectivity among the buildings of CLASSE (currently 1-gigabit).

Wednesday, April 02, 2014

Tuesday, March 18, 2014

  • LNS61's USER$DISK4 has been restored as of January 8, 2014. Newer files will be restored later this morning.

Monday, March 17, 2014

  • LNS61's USER$DISK4 failed over the weekend. It is shared by LNS62. We are attempting to restore the disk from tape, but have encountered some additional hardware problems. It might be available again some time late this evening or tomorrow. Fortunately LNS61& LNS62 are VMS computers that very few people use.

Thursday, March 6, 2014

  • 10AM Outlook 2010 under Windows 7 stopped working. Use Outlook Web App at http://outlook.cornell.edu/ instead
  • 4:25 PM Outlook 2010 under Windows 7 is working again.

Friday, February 28, 2014

  • 9:53 AM -- We believe that all of the CLASSE computing services are back to normal.

Thursday, February 27, 2014

  • 09:54 AM -- There was a campus-wide power outage. Wilson Lab's power was restored at about 09:55. We are in the process of bringing up systems.
Please be patient.
  • 3:20 PM -- Most services are up. Unfortunately, many CLASSE SAMBA directories are still unavailable. As a result of the power outage, they all are undergoing disk consistency checks. Because of their large sizes, the checks are taking quite a while, and we don't know when they will be finished.

Tuesday February 4, 2014

  • 3:51PM - Resolved with a configuration change on the cluster host.

The VPN endpoint has been moved to the CLASSE cluster. We are investigating some issues with connections after the move.

Previous Status Messages:

Topic revision: r427 - 18 Dec 2018, JamesPulver
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback