Some Fridays start out better than others….
Around 8:40AM PDT we experienced a power failure at our co-location facility that took DRMetrix offline. A team of people have been working on this issue ever since.
10:00AM PDT – Power has been restored. The underlying cause is still being researched. We have redundancy on power so in theory this should not have happened. We are waiting to understand if the power issue was internal or external to our systems.
10:50AM PDT – All of our servers are being brought back online one by one. We put Adsphere into maintenance mode because the database servers are still in recovery mode. As soon as the database servers are happy again, we’ll bring AdSphere out of maintenance mode.
11:23AM PDT – We are still struggling to get the database servers back to normal. Queries are running way too slow for production forcing us to leave AdSphere in Maintenance mode for the time being.
12:10PM PDT – AdSphere has been taken out of maintenance mode and the database server is performing better.
We are still dealing with numerous issues. Network capture servers are still having issues and it will take some time to get them all back online and working normally. Unfortunately, this means we’ll be dealing with some data loss (inability to capture airings data while systems are down). Even when our systems are fully restored, we will be unable to report airings data during the period of time when our systems were down. Also, break type detection will be impacted. Our hope is that we can get all of our system back online over the weekend, if not sooner. Please know that we will be doing additional system maintenance over the weekend and Adsphere will be offline at times.
Since we began operation in 2014, DRMetrix has worked with an exceptional group of technical professionals as valued partners and vendors. It’s heartbreaking for our team when something like this happens and we feel terrible for inconveniencing our valued clients. DRMetrix and its technology partners will endeavor to determine the root cause of what occurred and what measures can be taken to prevent future occurrences. We appreciate everyone’s patience and sincerely apologize for this situation.
DRMetrix operations were largely back online by Friday afternoon. We are still experiencing some issues with our registration system. Registration of new ads since Friday AM will be delayed as our team catches up over the next few days.
It’s been a little over two weeks. Our co-location provider determined that there was a brief power fluctuation that took place across their electrical infrastructure on 4/26 that impacted multiple DRMetrix circuits. The source of this was external to DRMetrix’s hardware and systems. Much work has been done in an effort to trace down numerous issues caused by the power dip and to return all of our systems to normal operation. Our equipment has been moved to a different set of electrical circuits and additional redundant power equipment has been installed. Many hours have been spent troubleshooting and resolving issues caused by the power dip. One remaining challenge which has dragged on for the past two weeks was a mysterious I/O issue that was causing random issues with video capture and playback quality for brief periods across random networks. The issue would come and go at random times and affect different networks. We estimate that airing detection accuracy levels have been reduced by a few percentage points during this period of time. We believe we recently identified the culprit and, since taking corrective action, no further I/O issues have been detected. We hope that all of the issues resulting from the power failure of 4/26/19 have now been resolved.
Thank you to our customers for all of your support and understanding.