Scheduled Maintenance on websrv01

Update: 9:29AM EST
Dell has successfully completed the maintenance work and replaced the failed hard drive on websrv01. All services are currently online; however, the server will be a little slow until it fully syncs the newly installed hard drive with its mirrored drive. We expect that this will take another few hours to complete. If you have any trouble, please let me know.

Original Post: Oct 27 @ 8:02PM EST
I confirmed earlier today with Dell that a service technician will be on-site in our Montreal data centre on Thursday, October 29th at 8:00AM EST to replace the failed hard drive on websrv01. While this maintenance is taking place, all services on the server will be unavailable. We expect that the downtime will no longer than 1 hour; however, your patience as we replace the failed hardware is appreciated.

Just to confirm, all services on websrv01 will be unavailable:
Thursday, October 29th, 2009 @ 8:00AM EST

Unscheduled websrv01 Downtime

Update: Oct 21 2:24AM
websrv01 is currently back online. Diagnostic tests are currently running on the server as we speak (thanks to Greg), but initial reports indicate that 1 of the 2 hard drives on websrv01 has died. Luckily we run a RAID 1 (mirror) configuration, so the other drive is picking up the slack (whew). Dell is aware of the issue and will get back to me later in the day to schedule a time for them to visit the data centre and investigate further. I will post more information as it becomes available.

Initial Report: Oct 20, 11:03PM
websrv01 is currently off-line. It is highly upsetting to say that; however, we are currently experiencing some major hardware issues / failures. I am currently working with Dell and our co-location provider to resolve the issue; however, we expect that the server will be down most of the day on Wednesday while we recover service.

Unscheduled websrv01 Downtime

This morning we experienced a short unscheduled service outage on websrv01 due to spam attack that took place early in the morning. This incident could have easily been avoided if a select few users had e-mail address passwords that were not incredibly simple. If you have a simple e-mail address password, please change it immediately. Passwords should be alphanumeric and contain a minimum of 6 characters, and no dictionary words.

Unscheduled websrv01 Downtime

We are currently experiencing an unscheduled service outage on websrv01 due to what we believe may be a hardware issue on the server. In fact, I think this could be the same issue we encountered on April 25th, and I hate to say it but our co-location provider *still* has not resolved the misconfigured the power port that our server is plugged into, so I am still unable to reboot the machine.

A technician has been informed of the problem, and someone is going down to the server to reboot it right now. Luckily, I am told there are people in the building today, so it should be back shortly. I will post an update as soon as I know anything.

Update 9:29AM
Data centre technicians are making their way to the server right now to fix the APC switch and restart the machine.

Update 10:26AM
I’m still waiting, and getting more angry by the minute. I apologize for the inconvenience.

Update 10:50AM
websrv01 is back online after the technician finally rebooted the server, I apologize once again for the inconvenience. I am fairly certain that they assigned John to my support ticket:

Data Centre Technician John
Data Centre Technician John

Unscheduled websrv01 Downtime

We are currently experiencing an unscheduled service outage on websrv01 due to what we believe may be a hardware issue on the server. Unfortunately our co-location provider misconfigured the power port that our server is plugged into, so we were unable to reboot the machine ourselves. Currently we have a technician assigned in Montreal who is on his way to the data centre to reboot the server and investigate further. We will update this post as more information becomes available.

Update 3:53PM
We are still working with our co-location provider to determine the exact cause of the problem. One theory currently being investigated is that we may be experiencing a distributed denial of service attack on the server. As soon as we have any further information, we will post it.

Update 6:20PM
The problem has now been resolved, and all service has been fully restored. It does in fact appear to have been a distributed denial of service attack, which fortunately ceased on it’s own. We sustained 1Mbit of http traffic to websrv01 for only a short period of time before the server was unable to handle the requests. The 1Mbit wall continued until just after 6PM when it stopped just as mysteriously as it began. Further investigation is on-going and any new information will be made available.

We apologize for the inconvenience.

Unscheduled websrv02 Downtime

A quick post to let you know that as of 10:55AM today websrv02 is currently off-line, and we are currently investigating the issue. At this time it appears as though the power port the server is plugged into at our data center has malfunctioned, but we are confirming this now with the network administrators. I will update this post as new information becomes available.

Update 12:01PM
We have confirmed that the problem is an issue with the APC power unit that all servers in this particular rack are connected to. Our network administrators are working at installing a new APC power unit as we speak, and the issue should be resolved very shortly.

Update 1:50PM
I have just received another update from our network administrators stating that they are now replacing the switch that connects all of the servers on the rack. They have informed me that it should be operational again shortly.

Update 2:10PM
Service has now been fully restored. If you have any issues, please feel free to let me know. We apologize for any inconvenience this may have caused.

Silentweb News Update (Volume 4, Issue 3)

Final Upgrade Status Report

I am pleased to report that the upgrades to both websrv01 and websrv02 were completed successfully, and for the most part without incident. There were a few minor hiccups which were caught by users, reported to us and resolved shortly there after. As of this moment there are no known issues with any services, so if you are having a problem please report it me so it can be resolved.

This has without a doubt been the most solid Plesk upgrade in history, and Parallels (the company that makes Plesk) definitely deserves some kudo’s for finally getting it right.

Our servers are now running Plesk 9.0.1, and there are plenty of new features so feel free to log in and poke around. As I mentioned previously the interface is rather different and it takes a bit of getting used to, but I think you will find that it is actually much nicer to use.

New Webmail Interface Option

One of the many new features of Plesk 9, is a new option that allows you to choose a different web-mail client. The default webmail client that you are all used to is an open source application called Horde. Horde is feature rich with tools like calendar, address book, etc., but if you are looking for something a bit lighter try AtMail!

You can choose which web-mail client you want to use on a per-domain basis, and can switch back and forth any time without losing your messages.

To enable AtMail:

  1. Log into Plesk.
  2. On the domains’ “Home” page click the “Mail Accounts” icon.
  3. Now click the “Mail Settings” icon.
  4. Switch the “WebMail” select box to “AtMail 1.02” (the default is Horde 4.1.6).
  5. Click “OK”.

That’s it! You can now visit your web-mail page (i.e. http://webmail.yourdomain.ca) and log into AtMail using your e-mail account username and password. Don’t worry, if you decide you want to switch back, simply reverse the procedure.

Conclusion

I gather from some conversations that some of you are probably reading this message and thinking to yourself “what is Plesk anyways?”. Plesk Server Administrator is a piece of software that we use on our servers to allow *you* to easily manage portions of your domain and hosting account on our servers (i.e. e-mail accounts, spam filtering, web statistics, password protected directories, FTP passwords, etc).

Silentweb News Update (Volume 4, Issue 2)

Server Upgrade Status Report

I just thought I should send out a quick update to give everyone an idea how the upgrades are going.

websrv02:
We successfully upgraded websrv02 last night, with little in the way of problems. There was only minimal service interruption (about ~15 minutes), and all services are operating as normal. If your website is on websrv02 and you are having any trouble, please let us know immediately.

websrv01:
We were having some trouble doing adequate backups of websrv01 and we cannot start these major server upgrades until we are satisfied our backups are reliable. Unfortunately we have no real choice but to push the scheduled downtime back until our backup process is complete, which could easily take another 2 or 3 hours.

Please be advised that there will be a service interruption while we are doing the upgrades.

Silentweb News Update (Volume 4, Issue 1)

New Scheduled Downtime Notice

I would like to notify you that there will be some scheduled downtime on both of our servers this coming weekend while we perform scheduled maintenance tasks and upgrades on both of the production machines.

websrv02: Friday, January 30th at 10:00PM EST
websrv01: Saturday, January 31st at 10:00AM EST

The first downtime should not be too much longer than a few hours per server; however, Plesk upgrades have in the past went over our maintenance window so I hesitate to give a precise amount of time (~1 – 2 hours). There will be a second downtime after the Plesk upgrade is finished, which should be very brief (~15 minutes) as we apply operating system patches and reboot the servers.

During this maintenance window we will be upgrading both servers to the latest and greatest Plesk release (details below), as well as doing all of the regular operating system patches, and a PHP 5.2.6 upgrade.

Plesk 9.0.1 Upgrade

This is a significant Plesk upgrade that has many changes including a rather new visual interface. I will caution you that the new interface is different, and while you will need to learn the new way Plesk operates, I am sure that the new features will make up for any inconvenience.

Here are some of the many new features you may notice:

  • New visual interface.
  • Ability to add your logo and branding to control panel.
  • New optional web-mail client called AtMail.
  • Upgraded version of IMP / Horde web-mail client.
  • Option to run PHP in FastCGI mode.
  • Ruby on Rails support.
  • Ability to throttle bandwidth on high traffic sites.
  • Very much improved website backup and restore tools.
  • New anti-spam techniques (i.e. DomainKeys).
  • Better support for Safari 3.1.
  • The usual bug fixes and security hardening.

Website Backups and Account Cleanup

We do fairly regular backups of our servers as part of our disaster recovery plan, but if something happened would you have a backup of your content? I would hope that the answer to that question is yes, but in case it is not now is a good time to remind you.

Please make sure that you keep backups of your websites, databases and e-mail.

While you are looking into what you need to backup, check for things you can delete. If there are old domains in your account they *must* be deleted. Old web-applications *must* be removed or upgraded to the latest release.