[RESOLVED] Switch failure

3:25pm ET We are experiencing some connectivity issues in the Miami-A Network.

4:20pm ET We had a core switch power failure. We have replaced the switch and all services have been restored.

CURRENT AFFECTED SERVICES: Network Connectivity to Miami-A01 to Miami-04

WEBBIES ARE UP AND RUNNING.


console maintenance

3:25pm ET We are bringing console down for about 30 minutes while we apply a fix.

4:20pm ET We finished testing and console is working smoothly.

CURRENT AFFECTED SERVICES: Webby Console.

WEBBIES ARE UP AND RUNNING.


mia-b03 Maintenance

1am EST: Performing an emergency reboot on mia-b03. Shutting down all Webbies then rebooting.

1:40am EST: mia-b03 coming back up now.

1:50am EST: mia-b03 is now 100% back online. Sorry for the unexpected downtime.


Hardware Failure miami-a06 [in progress]

4AM EST: We are experiencing multiple hard drive failures, and issues. We are investigating on-site.

6AM EST: Array is rebuilding, and doing a long checksum process. Webbies remain offline. This process may take a few hours.

7AM EST: We’re still troubleshooting this issue, waiting for the array processes to finish.

9:50AM EST: The array rebuild is at 78%.

2:30PM EST: The array rebuild has halted and we are now copying all Webbies off of mia-a06 to perform recoveries. More information will be posted here as we know more.

4:00PM EST: We are still copying Webbies off of mia-a06. We are trying to make backups of all Webbies before we attempt any risky RAID array recovery procedures.

4:30PM EST: We are 1/3 of the way through backing up all Webbies on mia-a06. Due to the varying size of Webbies we can’t say for sure how long the last 2/3′s is going to take. More to follow.

7:00PM EST: Webbies are coming back up one at a time as we restore them, emails are going out as Webbies are up.

1:00AM EST: We have been able to recover all the data from miami-a06 and moved Webbies to other nodes. We are still missing a few Webbies for the raw data to be recovered. We are now waiting for the process to finish in order to deploy those Webbies without any data loss.

9:00AM EST: After a long process of recovering raw data from Webbies, we have successfully recovered every Webby and moved it to other nodes. Everyone has been emailed and notified.

We thank everyone with the patience and understanding on this long and extensive process. We could have gotten Webbies back much faster while giving up on the data, but we decided going down to the block level and restore the data completely.

Please contact support if you have any further issues.


Hardware failure miami-a07 [solved]

4AM EST: We are experiencing multiple hard drive failures, and issues. We are investigating on-site.

5AM EST: We had to replace 2 faulty drives in miami-a07, also had to reboot it. All Webbies are back online.

Please contact support if you’re having any issues and your Webby is on miami-a07.


Webby Manager maintenance

10:28 pm EST: We are doing some changes to Webby Manager to accommodate today’s changes and it should be back soon.

10:35 pm EST: We’re up and running again.

CURRENT AFFECTED SERVICES: Webby Manager, API, New Customer Signups, Redeployments.

WEBBIES ARE UP AND RUNNING.


Webbynode Mia-a0* Move

[TROUBLESHOOTING] If your Webby is on a node that is now back up and you cannot connect or ping, please reboot it from the Manger.

10:00 am EST All nodes in mia-a0* have been safely shutdown for relocation.

11:00 am EST All nodes are now physically in our second datacenter. No problems during move.

11:30 am EST Server mounts are being installed and all switches and firewalls have been racked.

12:55 pm EST We are currently still racking nodes and bringing them online one at a time so we can address any issues that may happen. So far there have been no problems. We are just moving slower than expected so we can be sure not to cause any undo harm by rushing the process. As nodes come online we will update their status here.

2:30 pm EST mia-a06 Webbies are now coming up. As the VM’s finish their consistency checks they will be online.

3:00 pm EST mia-a06 is 100% up.

3:40 pm EST Webbies on mia-a03 and mia-a05 are starting to come up now.

4:20 pm EST mia-a05 is 100% up.

4:35 pm EST mia-a03 is 100% up.

4:45 pm EST mia-a02 is starting to come up. mia-a04 is 100% up.

5:20 pm EST mia-07 is starting to come up. mia-a02 is 100% up.

6:00 pm EST mia-a07 is 100% up. Working on mia-a01 now.

7:50 pm EST We’ve had a RAID card failure on mia-a01, we’ve replaced the card and the RAID is coming back online. Unfortunately mia-a01 was the last not we brought online so we found this problem late. As soon as the consistency check is finished Webbies on mia-a01 will start coming back online. We will keep you up to date on it’s status.

8:06 pm EST Our last node, mia-a01 is up. We ask you to please reboot your Webby and open a Ticket if you experience further problems. Thanks a lot for the patience on this very tough day.


Webby Manager Maintenance

5:15 pm EST: We are currently performing a maintenance on the Webby Manager network, and connectivity from manager to all nodes is affected. All Webbies remain up and running. Intermittent downtime for the next 1 to 2 hours is to be expected.

CURRENT AFFECTED SERVICES: Webby Manager, API, New Customer Signups, Redeployments.

WEBBIES ARE UP AND RUNNING.

[UPDATE] This maintenance is completed, thank you for your patience during this downtime. Our next maintenance will be the physical server move starting Sunday, August 16th at 10:00am EST.


Follow

Get every new post delivered to your Inbox.