You may have been caught up in the recent ‘unpleasantness’ and disruption to sl, during which we were plunged back into the good old days of massive grid-wide failures and login problems – all very 2007! In the wake of which, in a refreshing change from the usual bland ‘unscheduled maintenance has been completed’ message, the Lab provided a follow up by issuing this rather candid and honest blog post.
It’s amazing how an explanation can make things a lot easier to bear, even although it makes little difference to the original problem – being told what went wrong goes a long way towards aiding our understanding of not only what the problem was, but also the efforts that were made to fix it, plus – for those among us who like nothing better than to badmouth the Lindens – it can reveal the true culprits, at whose feet the blame may lie, and who may not actually be who we assumed them to be.
There is, for example, very little that the Lab could have done to either anticipate or ameliorate the impact of a DDOS attack on one of their upstream service providers – these things are a fact of life on the internet – deal with it. If you want to lay the blame on anyone, then concentrate on the cybertards who enjoy bringing down networks for fun, and the millions of plonkers who allow trojans to set up home on their computers… it may be a sobering thought to consider that if your approach to internet security is sloppy, your computer may well have contributed to bringing sl down that day you couldn’t log in! Next time, the Lab has declared they are prepared, with another service provider waiting in the wings to step in, if things get sticky. (And before you say they should have thought of that beforehand, can i assume that you have a backup provider if your own ISP goes tits up?).
The DDOS attack aside, Landon Linden gives a comprehensive break down of what caused the, er… breakdown, why it caused so many problems and what the Lab are doing about it. There will certainly be a bunch of know-it-alls who will critcise the Labbies anyway – but before we go off on one, berating them for allowing a 10-year old problem to still exist, let alone manifest itself in such a public way, let’s just bear in mind that sl is a very complex system, and complex systems will – by their very nature – be riddled with potential failures and faults, no matter how carefully they are managed or how up-to-date they may be: and age is a big consideration.
In IT terms, sl is a venerable old lady – the MySQL database that failed so spectacularly has been a core service of sl right from the beginning: whilst it has remained pretty much unaltered over time, a whole range of functions, calls, systems, processes and code has been piled on and around it, much of which is highly specialised and reliant upon a whole raft of other things if it is to work correctly. It’s a bit like building a machine out of lego over a long period of time – as we add to it, alter and fiddle with it, cobble together add-ons and continually improve our model with those snazzy new bricks and accessories that didn’t exist a few years back, there will come a time when the core of our machine can no longer support all the bits and pieces we’ve added. The choice we’re faced with is either to strip down everything and rebuild from the core upwards, or add newer, supplementary cores that gradually make our original design redundant – either way, it’s ultimately an aging system that we’re trying to rejuvenate.
SL is over 10 years old: inevitably it has creaky joints, failing senses and a quirky, sometimes demanding, nature. There will be days – no doubt about it – when it will fall down, no matter how much care and attention it gets, those are the unavoidable, if unpalatable, facts. Anyone who drives a classic car, collects vintage cameras or anything similar will understand only too well! Many might seize on that to gleefully proclaim that sl is dying, but – on the contrary – i see a brighter future:
When i read Landon’s post, it is clear that the Lab are investing time, effort and money into putting sl right – that’s not something you do if you’re not planning on keeping something running; you either let it run down and die naturally, or sell it off for scrap. Instead, what the Lab say they are doing is looking at how to rejuvenate this decrepit old system, flush out the gremlins and give it bright, shiny new parts that will keep it running better than it ever did before… SL isn’t dying, it’s just a little tired and jaded at the moment. A good service, oil change and a new set of tyres, and it’ll be back on the road again to continue its journey.
To be honest, i can live with a few failed logins, the odd breakdown and the occasional bit of maintenance if it means that sl will not just keep going, but will do so in the future, better, faster and stronger than ever before.
View it, code it, jam – unlock it,
Surf it, scroll it, pause it, click it,
Cross it, crack it, switch – update it,
Name it, rate it, tune it, print it,
Scan it, send it, fax – rename it,
Touch it, bring it, pay it, watch it.
Pentatonix – Daft Punk