Insane hiring, young talent and marketing dreams

May 4, 2009 · Posted in People · Comment 

This is a cross post from our Slog Blog -

http://pagalguy.com/slog/2009/05/05/young-talent-insane-hiring-and-a-marketing-dream/

Backing up woes

May 4, 2009 · Posted in Technology · 1 Comment 

You’ve got your website up and chugging along very well (shared hosting or dedicated server or a vps) and life is good – the traffic is increasing, you are hiring and one fine day the site crashes. You panic, check with your hosting provider and you are told that it was a server crash. The hard drives didn’t survive and your account will be restored from a backup.

The backup could be a day old, a week old or a month old.  What also is of note is whether the backup was stored on the server itself, on a different server or on an offnetwork backup setup – or a combination of any of the above. You may wish to check with your provider about the backup setup because if your data really matters you, then you should be careful and proactive enough to find the best solutions that work for you.

Why o why?

Some of the above mentioned backup systems may not be helpful or be enough to save any of  your data in case the server was compromised.  If you backup all your stuff back onto the same server and a hacker finds a way into your machine – there is a chance he will destroy both your primary data and backups. You’re left with nothing after this, except if you had the foresight to do other forms of backups as well.

Lets say you backed up your stuff within another server in the data center, you are again not scot free. Say the DC goes down, or gets raided by the FBI (check HA post) or just a simple case wherein the server which was compromised is connected to the backup server using SSH keyless login (kinda required for easier setup of regular rsync backups) – here again you could have the possibility of losing all your data.  Kinda scary isn’t it?

All this makes a case for Off network backups i.e DC down, FBI raid, earthquake, flash flood, or err.. an errant truck crashing into a pole and taking the DC’s electricity offline.

But wait, if you still allow your primary machine to login keylessly to your external backup system, a hacker can take out your data and backups as well. If you are doing an external backup (within DC) or outside the DC – then work with solutions where you can’t login to the backup systems without knowing the login/pass and the login/pass should never be stored on the primary machine.  Take a look at solutions like Evaut or R1soft (we use this) to do backups of all your servers/accounts to an external provider.

We use R1soft because of a couple of features/advantages it allow us – first it does sector level incremental backups and therefore it doesn’t use too much outbound bandwidth as it transfers only the changed files (well, just like rsync), secondly it provides a control panel which lets us restores individual files, directories from any of our backups – we tend to maintain 30 snapshots of our servers at all times and in some cases over 240 snapshots. Finally the killer feature is bare-metal restore – say your box crashed – all you need to do is get a new box up, and specify the R1soft setup to restore stuff.  It will replicate everything as per the last snapshot, including the OS.  Kinda life saving if you ever need it. If you folks use any other backup setup, I’d love to hear :)

While you might get all things right – I’ve seen cases where these backups were not verified and all the hardwork has gone down the drain because the integrity of the backups were not verified on a regular basis. Also you may want to try and restore your backups on a spare server sometime to ensure you have gotten it right.  There are various backup options available today, opensource and commercial – but the above are some of the problems we take seriously with our data and work accordingly. It is never possible to have a 100% secure setup (someone just needs to find one loophole or exploit, while you have to continuously patch 100s of them) – but do spend the time and take the effort to build a backup system that appropriately reflects your value for the data. You can’t always spend a bomb to create a backup system, when you may be fine with losing 1 day worth of data :)

As you think critically of HA, your backup solutions too needs to be thought about in a critical manner that reflects the importance you accord it.

High Availability

May 1, 2009 · Posted in Technology · Comment 

Every time we grow & traffic sets new records – we never fail to be happy at the need for more servers to be added to our rack.  The happiness however very soon makes me cringe because I know it is going to cost more to add all these servers to the rack and what is even more tough is to keep these set of servers chugging along at a good speed, secure and easy to operate.

I’ll focus on the high availability (HA) challenges of such a setup and why the costs get very steep very soon in case you require HA.  The challenge for a startup is to work on an architecture that allows you to start at the right scale and then extend it as painlessly as possible.

For an early stage startup with minimal traffic, you can get by with a single server handling files, databases, emails and the webserver as well.  After you’ve grown for a while, the database has the biggest chance of becoming your bottleneck, unless you are serving tons of files really fast. Now you would need to put in a separate server which just runs your MySQL installation. Right after that you would realize you need more machines on the frontend to serve all the awesome goodies.

This process is vicious and very soon you will have a couple of machines upfront acting as the front end and a couple of database servers. To keep MySQL playing nice, you should have sharded your data across multiple machines and/or put up those slave MySQL servers to which you can force your application reads. Now while you need to pat yourself on the back for being able to generate this much traffic – the next set of challenges are just starting.

This is the time you start worrying about single points of failure.  HA simply means your system stays up and alive even if  certain parts of the infrastructure come down on you. Hopefully you put up a hardware load balancer or a failover system w/software load balancers (haproxy comes to mind) – because if you didn’t and used a single load balancer on a server, all you need for your entire site to go down is that one server to go down.  All your front end/backend servers come to naught when you don’t load balance/failover your loadbalancer ;)

Here you were operating on a really tight budget and major chinks in the architecture start showing up.  What if the master MySQL server goes down? Now usually these are fairly expensive beefy machines – but the fact is that hardware goes down and it usually does when you are the least prepared (hail, murphy!).  Now are the choices you need to make – do you put up a similar beefy MySQL server and wait for the 1%  time your server would go down and then failover the setup ? Or do you err. just chalk that 1% downtime to .. err..  you know .. keeping it cheap? Or you could HA MySQL by using a master + master configuration. However if you need to take frequent backups of your data, you may need a master + master + slave configuration  so that you could shut down the slave for a brief period of time and do all those important backups and send them to your backup systems. While adding your set of machines to the setup, do ensure you have all the servers on a private vlan connected with gigabit cards/networks.  Fast interlinkages, plus you don’t have to pay for the internal bandwidth transfer.

Lets also talk about backups – you do have backups dontcha!? Backing up your files/systems etc on the front end servers shouldn’t be too tough.  You could rsync them over to another location and keep off network backups or use bare metal restore solutions like R1soft and have the ability to restore entire servers very easily. The costs of additional backup servers & the bandwidth to do snapshots at short time intervals become drivers of your decision as you continue to keep everything safe and secure.   Explore multiple backup systems and do practice restoring them – that would be instrumental in getting you up back online fast.  Keep a copy of your data backed up off network – this is critical – you don’t know when your DC can be raided by the FBI and you might lose all your servers. Allright, if its not the FBI, it could be an earthquake, bankruptcy, power surges, building fires – whatever. If your company means more to you than the montly server costs, backup – relentlessly.

As you continue to grow and think HA, make sure you have no single point of failure that jeopardizes all your hard work.  One site I knew had a fairly comprehensive HA setup, but they kept their DNS on one server. Sadly, when that went down – nothing remained accessible.  A gentle reminder that when going HA, work hard to isolate all such possibilities and remove such dependencies as soon as you can :)