Tags

Old LEPP Network Status Notices 2011-2013


This page contains information on issues that were under investigation in 2011-2013. If you know of or are experiencing any problems now, please contact the Computer Group.

For information on issues that were under investigation in 2004-2010, see NetworkStatusArchive. If you know of or are experiencing any problems now, please contact the Computer Group.

Current issues are described on the page NetworkStatus

November 21, 2013 (Thursday)

There might be a power interruption at 1:30PM EST.

Today's steam outage is due to issues with the Central Plant emergency diesel generator.

Wilson Lab's current electrical feed is via Kite Hill substation to the Maple Avenue substation (Bus #1 and Bus #2)which are powered by the gas turbine combined heat and power system.

The turbines will intentionally trip off today at 1:30PM to start-test the diesel generator. At the same time Bus #1 and #2 will synch-switch to NYSEG supplied power.

The switch should be seamless, but small/short transients are possible.

The test will conclude by 2:00PM.

November 19, 2013 (Tuesday)

There will be a brief network outage starting at 9am EST, Tuesday, November 19, 2013. We expect the outage to be less than 10 minutes but it could be longer if we encounter problems.

For details, see the notice for November 12.

Unfortunately, November 12th's attempt did encounter serious problems.

November 12, 2013 (Tuesday)

There will be a brief network outage starting at 9am, Tuesday, November 12, 2013. We expect the outage to be less than 10 minutes but it could be longer if we encounter problems.

This outage will affect all of LEPP's network services, including wired network connectivity for Newman Lab, the Physical Sciences Building and Wilson Lab. We don't expect it to directly impact the CESR, ERL and SRF control systems' networks, but connections between LEPP and CHESS and betweeen LEPP and the outside world will go down. Connections between CHESS and the outside world will be unaffected.

The central LEPP router will be upgraded from a Cisco router to a Vyatta router which will provide higher throughput and additional functionality. Among other things, it will improve LEPP's connection to the campus backbone from 100 Megabits/sec to 1 Gigabit/sec.

October 1, 2013 (Tuesday)

There is a brief network outage planned for the Physical Sciences Building on Tuesday morning October 1, 2013, from approximately 7:30-8:00am for CIT to install some upgraded network components.

August 22, 2013 (Thursday)

  • Wilson Lab lost power briefly due to a passing thunderstorm. Some core systems lost power and needed to be rebooted. Recovery should be mostly complete as of 1900 EDT.

August 7, 2013 (Wednesday)

  • Wilson Lab's network had problems starting at about 4:20pm. They resulted in the CLASSE Linux cluster crashing, which caused additional problems. The cluster was up and things seem to be back to normal by about 6:15 PM.

August 3, 2013 (Saturday)

July 24, 2013 (Wednesday)

  • There was brief, accidental network outage at about 9:30AM while the UPS was being moved. Apparently some power connectors were unexpectedly jostled.

July 14, 2013 (Sunday)

  • On Sunday, July 14th between 6:00am and 12:00pm CIT Network Engineering will upgrade the code on the campus network routers. There will be several 15 minute outages in the connections among all CLASSE buildings.

July 10, 2013 (Wednesday)

  • Most of the lab's Linux and Unix computers stopped responding at about 16:45 (4:45 pm) Services resumed at about 17:05. We are investigating.

July 8, 2013 (Monday)

  • The accelerator file systems have been moved to the central CLASSE Infrastructure cluster. If you are unable to access or write to any accelerator file systems and are working on a personal desktop or workstation, a reboot may be the easiest and quickest solution. Alternatively, please email service-classe@cornell.edu and the computer group can fix the mount points or schedule a reboot.

May 27, 2013 (Monday)

May 14, 2013 (Tuesday)

  • There was a brief power outage in Wilson and Newman Labs at 7:25 AM this morning, just long enough to take out the majority of CLASSE computing services. Most of them were up by about 11:00 AM.

May 8, 2013 (Wednesday)

  • The cups print server is down. We hope to have it back up early Thursday morning.

  • Early next week, wireless access in Newman Lab. will be switched over to Red Rover.
  • On Wednesday, May 8, at 10 AM in Newman 311, a session on "How To Use VPN" will be provided for those who need help using the VPN to access CLASSE networks while connected to Red Rover.

For someone on Red Rover, printing and access to Lab. resources (Samba, Vault, etc.) will require the use of the CLASSE VPN (Virtual Private Network). Use of the CLASSE VPN requires the OpenVPN client software to be installed on the computers that you use. CLASSE computer staff will install OpenVPN on CLASSE managed laptops which don't already have it. Instructions for installing the client yourself on any personally owned computers are available on the Wiki page below:

https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/CmpGrp/Obsolete/OpenVPN

Please stop by Wednesday's session for help installing, configuring, or using the VPN client. Bring your laptop if you have one. This also will be an opportunity to make sure you know your CLASSE password. You can test whether or not you know your CLASSE password by attempting to view the following page:

https://foswiki.classe.cornell.edu/Sandbox/Testauth/

If you are unable to view that page, please email service-classe@cornell.edu or stop by the VPN session for help resetting your password. We recommend viewing the following documentation to help minimize any future disruptions from the gradual process of replacing legacy usernames and passwords with CLASSE credentials.

https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/Obsolete/CLASSESinglePassword

Cornell's documentation for Red Rover (which already provides wireless access in Wilson Lab, the Physical Sciences Building, and many other buildings on campus) is available at http://www.it.cornell.edu/services/redrover/. Particularly useful are the HowTo (which helps get you set up) and Frequently Asked Questions sections.

Please join us at 10AM Wednesday in Newman 311, and as always, please email service-classe@cornell.edu with any computing or IT related problems or questions.

April 17,2013 (Wednesday)

  • The WinAPP virtual server had a virtual hardware failure. Time to recover is unknown. Updates will be posted here and to our CLASSE-IT-NEWS mailing list.
  • April 18,2013 - Update:
    • We are restoring from a backup. Current estimate is sometime tomorrow for full restored service.
    • We believe we have identified the cause of the failure and are taking steps to prevent it from happening again.
  • April 19, 2013 10AM - Update:
    • We have now restored from backup and are re-configuring WinAPP for functionality.
    • While it may appear to be available, it currently will be restarting several times as we update functionality. Please do not use it until we notify you that it is fully re-configured in a status message.
  • April 19, 2013 4PM - Update:
    • WinAPP is now available again for use.

March 14,2013 (Thursday)

  • The \\Samba service went down at about 2:45 pm. It was restored at about 3:50 pm. (A clustered file server had been shut down incorrerctly.)

March 7, 2013 (Thursday)

  • The LEPP Unix home disk server was unavailable from approximately 12:30 to 8:00am. Work is underway to replace our legacy home disk server with a high-availability service.

January 31, 2013 (Thursday)

  • 10:25AM - Time skew fixed on VPN endpoint via NTP sync - VPN now accessible
  • 10:15AM - Some services back on line
    • CUPS
    • Matlab / Windows License Server up
    • Samba
  • We are currently working on recovering the License Server and VPN endpoint
  • 8:43AM - various LEPP services are down.
    • License Server for Matlab etc
    • SAMBA service
    • VPN Endpoint
    • Printing (CUPS)

  • There was a campus-wide power outage around midnight. Recovery is underway, and will continue throughout Thursday.

January 24, 2013 (Thursday)

  • The server known as webdb will be restarted at 4:30pm to increase its memory from 1GB. This will temporarily take down the CLASSE Training Database and CLASSE ELOG services.

January 16, 2013 (Wednesday)

  • The CESR VMS Control System cluster will be shut down and then restarted at 10AM. Downtime is expected to be less than an hour. Some cluster members will be up in much less time.
  • The CESR VMS Control System cluster was up again by about 10:45.

January 3, 2013 (Thursday)

  • Our NX server is down. We'll power-cycle the system and investigate further Friday morning.

December 11, 2012 (Tuesday)

  • Starting at 10am, erp101 will be upgraded to 64-bit SL6. This will result in a temporary outage of the main erp file system, p1adata, and ERP home directories. During this upgrade, no EPICS Process Variables from the ERP subnet will be archived.

October 23, 2012 (Tuesday)

  • There seems to be an issue with the LEPP Windows Firefox Update that was pushed last night. It may ask you to reboot your computer every time you try and start Firefox. The LCG is aware of this issue and is planning a revised push to resolve this problem. More updates will follow.
  • Update 1
    • The initial cause of the issue has been identified. While testing remedial steps, reports of it resolving itself were received.
    • At this time, I believe if you try running Firefox and still get a request to reboot, please do so. Then try running it again - it seems it will then work for at least some users.
    • If after this you still have issues, please submit a service request and we will take a look at your particular computer.

October 22, 2012 (Monday)

  • LNSCU5, LEPP's Unix mail server mail.lepp.cornell.edu, had a disk fail as a side effect of Sunday's power glitch. A new disk was installed and files were restored from backup tapes. The system was up again by about 1PM.

October 21, 2012 (Sunday)

  • At around 9am this morning there was a (crow induced) several cycle glitch on 5 of the 6 phases of power to the Lab. from Kite Hill. Most critical systems are now back up, but the LEPP Computer Group is still working to recover a handful of infrastructure services.
  • While recovering from this morning's power glitch, a breaker in the LEPP Trailer has repeatedly tripped. To help prevent any more wide-spread disruptions, we are forced to leave a few legacy disk servers down for the night.

October 9, 2012 (Tuesday)

  • At 10am, lnx280c (also known as cesrweb) will be rebooted. This is necessary to update lnx280c's BIOS in a first step at addressing two crashes we've seen in the last month.

October 2, 2012 (Tuesday)

  • LNX200, LEPP's NX server, will be upgraded to SL6 starting at 10am. Users should expect lnx200 to be down for most of the day, and possibly into Wednesday. LNX201 can be used via SSH or VNC while lnx200 is unavailable. For instructions, please see RemoteLinux.
  • LNX200 is now running 64-bit SL6. For the new key required to connect to lnx200 using NX (and general setup instructions), please see FreeNX.

September 27, 2012 (Thursday)

  • accserv, the central accelerator libraries and repository server, will be upgraded to SL6 starting at 2pm. As this will require a full dump and load of the SVN repository, this upgrade could take an hour or two. During this time, /nfs/acc/libs and the Accelerator repository will be read-only and periodically unavailable.
  • the upgrade of accserv went smoothly.

September 25, 2012 (Tuesday)

  • Maintenance has been scheduled for erp101 between 10 and 11am Tuesday, 9/25. This maintenance will result in periodic outages of /nfs/erl/erp, ERL Control System home directories, and the archival of EPICS data from the ERP subnet.

September 17, 2012 (Monday)

  • LNX201 is currently hung and will need to be rebooted. To avoid multiple disruptions for our users, we will also proceed with the upgrade to 64-bit SL6. A followup announcement will be sent when lnx201 is back up. In the meantime, lnx6212 can be used in lieu of lnx201.
  • LNX201 has now been upgraded to SL6. As the host key changes with each upgrade, you may need to execute ssh-keygen -R lnx201 before logging in again. For future reference, this is documented at ManInTheMiddleAttack .

August 27, 2012 (Monday)

  • As of about 9AM, LEPP's RT problem ticket system is not functioning. (The computer lnscua accepts meil deliveries but does not add them to the RT database.)
  • 10:45 As a temporary work-around, service-lepp is being forwarded directly to all LEPP computer group members. Please continue to report problems to service-lepp@cornell.edu We'll respond to them as quickly as we can.
  • 12:00 RT is once more working.

August 25, 2012 (Saturday)

  • Network connectivity was restored at 8:31 AM.

There will be a scheduled network outage of the LEPP network from 6 AM until Noon on Saturday, August 25. The outage might be shorter, but don't plan on it. CIT will be working on network cabling in Wilson Lab as part of the "EzraNet" project.

Networks inside Wilson Lab itself should continue to work, but there will be no connectivity to the rest of campus or to the outside world. In particular, there will be no connectivity between CHESS, Wilson, Newman and the Physical Sciences buildings. The LEPP network connections in Newman and the Physical Sciences Building depend on their connections to Wilson, so they will not work at all.

The Red Rover wireless network connections in the Physical Sciences Building should continue to work for access to the rest of campus (other than LEPP and CHESS) and to the outside world.

August 23, 2012 (Thursday)

lnx111 will be down for approximately 30 minutes while we replace a failed system disk. During this time, /nfs/cleoc/data8, /nfs/c3mc/mc1, and /nfs/grp/uk1 will be unavailable.

August 9, 2012 (Thursday)

From roughly 12pm to 1pm, cesr101, cesr102, and cesr104 will be rebooted to finish their move into the new CESR rack. There should be no outage of cluster services during this maintenance.

August 7, 2012 (Tuesday)

From roughly 10am to 1pm, the CESR Linux cluster will be consolidated into the new CESR rack. This maintenance will require physically moving cesr101, cesr102, cesr103, and the CESR SAN switches and iSCSI devices. As we will be moving the main CESR iSCSI device, any user processes running on cesr101 - cesr109 may need to be killed.

August 1, 2012 (Wednesday)

Between 10am and 12am, each CESR Linux cluster member will be rebooted at least once to install new firmware. Cluster services will be balanced during this maintenance and there should be no interruption of services.

July 10, 2012 (Tuesday)

At 10am, CLASSE's Invenio installation will be updated to v1.0.1. This is a minor bugfix release only, and should only result in ~15 minutes of downtime for CLASSE's Electronic Document Management System - https://edms.classe.cornell.edu/.

June 22, 2012 (Friday)

A HyperNews vulnerability is being exploited, leading to compromises of HyperNews fora. To protect against this, LEPP has now restricted access to the CLEOc HyperNews server using our LEPP Network Principals.

Users accessing the CLEOc HyperNews server will now have to authenticate twice, first using their LEPP Network principal. They will then need to actually login to HyperNews using their HyperNews username and password.

As usual, users can reset their network principals using the "knetpw" command on Linux or by emailing service-lepp@cornell.edu. A little more information on network principals can be found at UserAccountsAndPasswords

June 19, 2012 (Tuesday)

  • Maintenance has been scheduled for accfs (also known as lnx113) starting at 10am Tuesday 6/19. We expect this maintenance to be complete by 11am. Any file systems served from lnx113, including /nfs/acc/user and /nfs/acc/temp, will be unavailable during this maintenance. Please take this opportunity to minimize the effect of this outage by reconfiguring your environment as recommended at NetworkedFilesystems.

June 18, 2012 (Monday)

  • One of the main circuit breakers tripped in the computer room in the LEPP trailer at about 1:42 pm today, Monday, June 18. Power was restored at about 1;50. Some computers are still down.
  • lnx114 and lnx118 are still running FSCK as of 6:30pm and may be down for several more hours.

June 12, 2012 (Tuesday)

  • Maintenance has been scheduled for accfs (also known as lnx113) starting at 10am Tuesday 6/12. Any file systems served from lnx113, including /nfs/acc/user and /nfs/acc/temp, will be unavailable during this maintenance. Please take this opportunity to minimize the effect of this outage by reconfiguring your environment as recommended at NetworkedFilesystems.
  • This maintenance was complete by 10:45am

June 4, 2012 (Monday)

  • Samba, also known as lnx117, will be rebooted at 4:45pm to address a locking problem with the accelerator file systems. During this time, the samba service and any file systems served by lnx117 will be unavailable.
  • As of 5:45PM, lnx117 was still running fsck, taking much longer than expected.
  • The Samba server (lnx117) was up again at about 8PM.

June 3, 2012 (Sunday)

  • lnx113 and its file systems are now back up. We believe we tracked down and have addressed the increased disk usage. Please send any problems or questions to service-lepp@cornell.edu.

June 3, 2012 (Sunday)

  • /nfs/acc/user filled up, eventually taking down lnx113. The system is currently running FSCK over its file systems. Until it is up, any access to /nfs/acc/user, /nfs/acc/temp, or any of lnx113's file systems will hang.
  • LNX113 will remain down until Sunday.
  • www.lepp.cornell.edu was down from 5am to 10:30am

May 15, 2012 (Tuesday)

  • At about 11:15 the LEPP network stopped responding. Nameservice lookups failed. The router connecting us to the campus network did not respond. After rebooting the router, things started working again, although there was another brief interval when the router did not respond. By about 11:40 the LEPP network was working again. We are still investigating the cause of the problem.

May 8, 2012 (Tuesday)

The SSL certificates for imap.lepp.cornell.edu and smtp.lepp.cornell.edu have been updated. You may need to accept the new certificates in your mail browser. The new certificates will be good for three years.

April 24th, 2012 (Tuesday)

At 2012/04/24 17:08:27.000 VPN19, our VPN endpoint, went down. We are investigating now.

As of 2012/04/26 09:09:21.000, VPN19 is back up.

February 21, 2012 (Tuesday)

At about 10:45, LEPP's gateway to the campus network stopped responding. That connectivity was restored at about 11:15. The problem seems to have been caused by a failure in our link to Newman Lab. We are in the process of restoring Newman Lab's connectivity.

As of 12:20, network connectivity should be back to normal for both Newman and Wilson Labs. The problematic network connection at Newman has been isolated.

February 14, 2012 (Tuesday)

Maintenance has been scheduled for the CESR Linux Online and Home file systems from 10:00 to approximately 10:30am. Occasionally during this maintenance, access to the CESR Online and home directories may pause. In addition, Linux cluster members may reboot during this maintenance, killing any user jobs running on that system.

January 25, 2012 (Wednesday)

Maintenance was conducted on the CESR Linux Online and Home file systems from 11:30am to 1:30pm Wednesday, 1/25. During this time, /nfs/cesr/online and the CESR Control System home directories (/home/w201_ctl, /home/w207_ctl, etc.) were unavailable.

January 18, 2012 (Wednesday)

Starting at 10:00am, maintenance will begin on cesr109 and lnx184c. We expect lnx184c to be back up by 11:00 and cesr109 by 12:00. The CESR Online Linux File Systems may pause momentarily when cesr109 is first shutdown and again when it is brought back up. Announcements will be made over the beam phone before these potential pauses.

Starting at 10:30am, the new CESR Nearline subnet will be deployed. During this time, off-site access from cesr3101, cesr3102, and cesr3103 will be temporarily disabled.

January 10, 2012 (Tuesday)

From approximately 10-12am, LEPP's Invenio installation was updated to v0.99.4. During this time, all services running on edms were unavailable. This includes our DocumentDatabase (https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/DocumentDatabase) and Indico ConferenceManagement system (https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/ConferenceManagement).

January 6, 2012 (Friday)

ILCSIM's RAID controller failed Friday night at about 10:09 PM. The system has been reset, and the /nfs/ilc/sim filesystems are once again available READ-ONLY. We have requested a replacement RAID controller from Red Barn and will schedule its replacement when it arrives.

January 5, 2012 (Thursday)

The CESR VMS Control Cluster was up by Noon.

January 4, 2012 (Wednesday)

The CESR VMS Control Cluster is down. Its user disk (cesr29$dkb200) went offline at 3-JAN-2012 23:33:23.46. Attempts to remount it were unsuccessful. As of about 15:00, about 10GB of its content has been restored from tape to another disk, suggesting another 20 hours might be required.

December 22, 2011 (Thursday)

As LEPP continues the migration from 64-bit SL5 to 64-bit SL6, two farm nodes (lnx301 and lnx302) are now running SL6. In addition, LNX6212 is available remotely as a public SL6 system, and lnx4200 is running SL6 in the Wilson Public Terminal Room.

During this transition period, please be mindful of what code you build where. The best way to ensure you are building on an appropriate system is to use any central servers that are dedicated as build nodes for your project.

Additional information will be sent as this transition progresses. For initial notes on SL6 and transitioning to 64-bit, please see:

For help finding which computer is appropriate for your use, please see:

For help using our Grid Engine batch queuing system, please see:

November 23, 2011 (Wednesday)

  • The ilcsim disk server crashed around 10:10am. The system is currently booting back up and running fsck over its fileystems.
  • Ilcsim and its file systems were back up around 2pm.

November 22, 2011 (Tuesday)

  • LEPP's edms server was unavailable from approximately 10am until 2pm. This downtime was required to update our installations of Indico (conference management) and Invenio (document database).

November 11, 2011 (Friday)

  • 10:42AM - Network in the LEPP Trailer went down.
  • 11AM - Networking and many services are back up, however we are still recovering some servers and services.
  • 11:34AM - Resolved currently. There will be a scheduled outage to make some final configuration changes.

October 23, 2011 (Sunday)

  • A power controller in LT107 has failed, bringing down at least lnx106, lnx114, lnx115, and lnx118. We hope to have these systems back up early Monday.
    • All affected systems were back up by 2:30pm Monday.

October 20, 2011 (Thursday)

  • ~ 13:05 LNSCU5 (mail.lepp.cornell.edu) glitched again.
  • 13:30 rebooted
  • 14:05 shutdown for repairs
  • 15:00 up with different system box.

October 17, 2011 (Monday)

  • As of 10AM, Winapp (PC102, Windows Terminal Server) is down for the MS Office 2010 update. Updates will be posted here.
  • As of 11AM, Winapp is back up with the new Office 2010 software. Linux users will need to wait for the changes to the commands to be made. See here for more updates.
  • As of 11:34AM tests are complete on the Linux commands. They will be updated on all Linux computers tonight. This update is complete.

October 14, 2011 (Friday)

  • As of 12:15pm, all Linux farm nodes have been repaired.

October 13, 2011 (Thursday)

  • The Unix mail server, lnscu5, mail.lepp.cornell.edu, crashed several times between about 1:30 and 2:30 PM, then was down until about 4:15. We replaced its power supply. (The spare system had SCSI problems.)

  • The Unix/Linux home disk failed at about 4:35 PM. We are investigating.
  • 6:40 pm -- due to the extent of the hardware failure, it seems unlikely that the home disks will be available this evening.
  • 7:30 pm -- The disks on the old home disk server are not recoverable. The Unix/Linux home-disk service is in the process of being moved to the Linux computer which had been doing daily backups of them.
    • The contents of the new home disks will be as they were at about 4:00 AM. this (Thursday) morning, when the most recent backup was performed.
    • All systems which had files open on the central Unix/Linux home disks will have to be rebooted in order to see the new home disk server. This includes all batch systems and most desktop Linux computers.
    • CESR and ERL control systems' home directories are not on the central home disks and are not affected.
  • 8:30 pm -- Most critical systems have recovered. Some are running fsck, but should be up shortly. All other systems will likely remain unavailable until Friday.

October 4, 2011 (Tuesday)

Maintenance is scheduled for CESR Linux systems from 10am to 12pm on Tuesday, October 4. Any processes running on the new CESR Linux cluster at 10am Tuesday may be killed. In addition, periodic delays may be seen accessing the CESR Online filesystem and CESR Control Account Home Directories during this maintenance.

August 31, 2011 (Wednesday)

  • lnx102 went down around 6:45pm. This takes down the /nfs/opt linux filesystem as well as the Mathematica license server. Recovery is underway.
    • lnx102 was back up around 8:10pm. The system will likely require more maintenance and downtime on Thursday (9/1).

August 9, 2011 (Tuesday)

  • The CESR Control System will be down starting at 9am.

July 29, 2011 (Friday)

  • LEPP Unix home directories will be unavailable starting at 12:00pm for emergency maintenance on our home disk filesystems. During this time, any process trying to access a home directory will hang. Most processes should recover when the home directories come back up, but some systems may need to be rebooted to fully recover. We are not sure how long this maintenance will take, but are hoping it will be complete around 1:00pm.
  • The LEPP Unix home directories were back up by 1:00pm. Most problems resulting from this outage will be fixed by a reboot. Please reboot any workstations that are having problems, or submit a ServiceRequest and ask the Computer Group for assistance.

  • There was a network outage during the afternoon which was caused by a problem with one of the routers in the Campus Communications Center. Network connectivity was restored by about 4:30 PM. They're still investigating the cause.

July 26, 2011 (Tuesday)

  • LNS61 and LNS62 will be down starting at 5PM Tuesday evening. They will be moved to a different location in the Wilson computer room, W221. The computers will go down at about 5PM for an hour or two. This may cause some printers to be unavailable while they are down.
  • LNS61 and LNS62 were up by 7PM. Some printer queues weren't restarted until Wednesday morning.

July 25, 2011 (Monday)

  • Replicon is once more available, but its server had to be replaced and its software updated. Old bookmarks which directly access the server computer may not work. If you have problems accessing Replicon (e.g. "404 - File or directory not found") please update your browser's bookmark to use the URL https://www.lepp.cornell.edu/TimeSheet/replicon
  • File partitions on the LEPP Unix/Linux home disk server (SOL105) are having problems. The CESR and CLAN filesystems will be unavaiable starting at 12:45pm while fsck is being run on them. When fsck completes, the file server, SOL105, will be rebooted. Unfortunately, we don't know how long the fsck operations will take, possibly as much as an hour or so.
  • sol105 and the unix home disks were back up by 1:45pm. Please email service-lepp@cornell.edu with any problems or questions.

July 22, 2011 (Friday)

  • Replicon is unavailable as of late last night. We hope it'll available again some time this afternoon. The computer which provides its Web interface has failed. The computer which stores its database is OK, though.

July 21, 2011 (Thursday)

  • The Solaris Farm computers have had to be shut down starting at about 16:45. Two of the air conditioners in the LEPP Trailer computer room are refusing to stay on. The room is overheating in today's record high temperatures. We're not sure when we'll be able to turn them back on again. It may be several days.
  • In order to further reduce the heat, some of the older linux farm nodes that are idle are also being shutdown. Hopefully we can bring these back online tomorrow.

July 19, 2011 (Tuesday)

  • The CESR VMS Control System will be rebooted at 12:00pm to implement new access controls. This should result in less than 30 minutes of downtime for the VMS cluster, but it could be up to three hours before everything is done.
  • The CESR VMS control computers were up by about 12:50.

June 30, 2011 (Thursday)

  • The main CESR Switch will be rebooted at 9am. This should only result in roughly 15 minutes of downtime for all CESR online systems and subnets.
  • The ERL L0 archived EPICS data will be unavailable from approximately 10 - 10:30am. This maintenance is necessary to expand and avoid filling the underlying filesystem.

June 25, 2011 (Saturday)

LNS61 and LNS62 hung at about 3:40 PM on Saturday, June 25. The problem wasn't discovered until about 7:30 AM on Sunday. Both were up by about 8:45 AM. The probem is being investigated.

June 10, 2011 (Friday)

The air conditioners have been repaired, and all of the Solaris and Linux farm nodes were brought back online.

June 7, 2011 (Tuesday)

Many of the CESR control system and associated computers are unavailable as of about Noon. An attempt to update the firmware of the central CESR control system switch at Noon caused the switch to malfunction. (Updates in the past have had no problems.)

The problems were resolved at about 4PM.

June 5, 2011 (Sunday)

Multiple air conditioners in the LEPP Trailer have failed, leading to an over-temp alarm. As the Solaris farm is currently idle, the Solaris farm nodes will be shut down in order to alleviate some of the load. The Linux farm is also idle, so some of the older Linux farm nodes will also be shut down. Please submit a ServiceRequest with any questions or concerns.

June 1, 2011 (Wednesday)

The /nfs/ilc/sim1 and /nfs/ilc/sim2 filesystems will be unavailable from approximately 10am to 10:30am for scheduled filesystem maintenance.

May 22, 2011 (Sunday)

  • A core infrastructure switch/router failed around 6:15 am this morning. Connectivity to most systems was restored by 1pm. Access to the "compute farm" subnet is still down, as is access to the CLEO EventStore.
    • The linux comput farm and CLEO EventStore were restored as of 9am Wednesday, May 25.

May 12, 2011 (Thursday)

  • A power-strip has tripped in LT107, bringing down a few farm nodes and servers (such as hypernews). Some load was removed from the power strip, and systems were quickly brought back online.

April 25, 2011 (Monday)

  • LNS61 and LNS62 were non-responsive at about 7 AM (not noticed until about 9:30am). Both were rebooted and were functional by 10AM. We are investigating what might have caused the p;roblem.

April 22, 2011 (Friday)

Due to a VM issue on one of our Virtual Hosts, there will be a short down time for the VPN endpoint at around 1:35PM. It should be back up by 2PM.

April 21, 2011 (Thursday)

  • There is an Adobe Flash player vulnerability in the wild. The computer group is pushing a flash update on Windows. Firefox will close. There may not be a notification. If Firefox closes, please wait 3 minutes and then restart.
  • If you are running Windows 7, please go to this Wiki page and download and run the patching program update (InstallFusion2.1.8-1.exe is the name) to make sure you get this and future critical patches. Verify that you are on our Wiki.
  • https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/Private/PatchSoftwareUpdateForWindows7

April 12, 2011 (Tuesday)

The upgrade of accserv is now complete, and the upgraded SVN repository is once again available. When accessing the SVN repository, users now need to authenticate using LEPP Unix Login Kerberos Principals. Please see the following page for more details concerning this upgrade. https://wiki.lepp.cornell.edu/lepp/bin/view/ACC/Private/SvnUpgrade

If after reading the above page you are unable to access the upgraded repository, please see the Accelerator Code Librarian or submit a service request.

April 11, 2011 (Monday)

For required functionality not available in our current server, the Accelerator group plans on upgrading the central Subversion repository server, accserv, starting at 5pm Monday 4/11. Users should plan for the repository server to be unavailable from 5pm Monday 4/11 until Tuesday morning, 4/12.

The upgraded server has been extensively tested by the accelerator group, and the upgrade process should, for the most part, be transparent. The only potentially disruptive aspect of this upgrade will be a change in the credentials used to login to the upgraded server.

Please see the following page for more details concerning this upgrade, and notify the Accelerator Code Librarian or service-lepp@cornell.edu with any questions or concerns. https://wiki.lepp.cornell.edu/lepp/bin/view/ACC/Private/SvnUpgrade

March 24, 2011 (Thursday)

As of 7:30AM, due to ongoing electrical work at Newman Lab, there have been several outages to the networks in the building, and will be several later in the morning.

March 14, 2011 (Monday)

LNS101 was refusing logins at about 4PM on Monday. It was back to normal at about 10AM on Tuesday, when Dan repaired a typo in Netgroups made by Selden.

March 11, 2011 (Friday)

The LCG began testing a new network mapping utility at 11AM today. This utility unexpectedly created some alerts on many end systems on the network. These originated from PC30.lns.cornell.edu. We apologize for any confusion, previous utility tests did not cause these sorts of disruptions and alerts. There may be some continuing stray alerts as we test network mapping tools. You can disregard these from inside our network for the time being.

March 7, 2011 (Monday)

Cornell University is closed this morning, which may delay the restoration of Lnx114, lnx115, and lnx118.

All systems were back up by 7pm.

March 5, 2011 (Saturday)

As of 12:00pm, lnx106, lnx108, and lnx117 (samba) have been restored. Lnx114, lnx115, and lnx118 will have to remain down until circuit 8 is repaired.

March 4, 2011 (Friday)

As of approximately 9:30pm, the following Linux fileservers are down. We expect to have these back up early Saturday.
  • lnx106, lnx108, lnx114, lnx115, lnx117, and lnx118

January 17, 2011 (Monday)

LNS123, the primary public NCD Xterminal boot server, has died. Please notify the LEPP computer group if you need to use an NCD Xterminal which will not boot so we can change its boot server to LNS101.

2004-2010 Network Status Messages

Please see the page NetworkStatusArchive for a record of previous status mesages.

-- SeldenBallJr - 27 Sep 2013
Topic revision: r5 - 04 Apr 2022, AdminDevinBougie
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback