First off, a huge apologies for the site's wonkiness and offlininess the last few days. Thanks to the tireless work of Stafftech, Mo and Iz, we're back up and running. Also thanks muchly to the admins for keeping everyone calm during the hurricane.So I'm sure there are some tech geeks like me that are dying to know what happened and how things were fixed. No? Well I'm just going to post this anyways, for posterity's sake. One day a few thousand years in the future when Worth1000 is studied by anthropologists looking to write a history book on the birth of digital editing, I want to make sure they get this particular chapter right.Over the weekend (3/11), we started noticing worth getting slower and slower. And so did many of you. It also started to lock up and would eventually fail. As things went on, they just got worse and worse. By Tuesday the site needed to be rebooted every other hour.We looked into the cause and found we were getting slammed by webcrawler bots identifying themselves as coming from google, yahoo, cuil and some other wierd unknown ones. Thinking that too be too weird a coincidence for all these companies to index worth1000 at once and that it really must be a malicious attack, we started turning off access to any visitor that identified itself as coming from these companies.We put a little blurb in place that said "We don't like you" intending it to only show up for them. But instead it showed up for everyone on the site for a few minutes until we caught the problem. Woops.Putting this fix into place didn't fix things. And the site was shutting down more and more frequently. Then the hard drive simply stopped working. Period. The constant reboots were too much for it and it simply died.Rather than rebuild a new hard drive on the server we decided to take the opportunity to move Worth to Amazon EC2. So we copied all the code overnight and provisioned a server there. Amazon EC2 is a really cool "virtual" server that can be almost instantly cloned and deployed. [We're in the process of setting up Worth so that it can run on multiple virtual servers in EC2, which means well have redundancy in case of future issues]Once on EC2 the application started to crash, yet again. So we knew we had a problem with the application itself, but we hadn't changed the code recently... so what could it be??After turning off various pages on Worth1000 to find the problem, we realized that any pages that stored large amounts of gallery content in memory were causing the site to grind to a halt and fail. As we have been adding all of Worth's contests to memory as a new way of improving performance, every day since we launched the new site the memory usage had increased further and further. It seemed that we had finally hit the upper limit of what the server could handle.So we needed a different quick way to store our entire index that wasn't in memory but was just as quick. Reading from a database or a cache system was just as problematic. Stafftech and Mo used a scaling service called Lucene to store all of the gallery and contest data (and anything else that is stored directly in memory). And lo and behold the system now works beautifully and quickly.There are still some kinks to work out since moving to EC2, but they are easy compared to the performance issue. The next step for us is to finish ironing out the kinks, turn back on anything that was disabled during the nightmare. Finally we're going to provision Worth1000 on multiple EC2 instances so that if one server goes down, you won't even notice it.Hope this was an interesting read for most of you. If it wasn't here's a kitten:
(c) Merlijn Hoek
KITTEN!
Some evil parts of me want this to happen again so I can read more stories like this.
I found the read very informative. Also thanks for the kitten.
/start mode=BooKITTY!!!!/end mode=Boo
Wow! Now go take a tap, you people deserve it!On a side note:Did any of you worthians get the chance to see "we don't like you" ? Did you panic thinking you were banned or something? hehehehe
Ddallas saidOn a side note:Did any of you worthians get the chance to see "we don't like you" ? Did you panic thinking you were banned or something? hehehehe
LMAO... I didn't see it, but If I had... I might have cried!
Ddallas saidDid any of you worthians get the chance to see "we don't like you" ?
I get that all the time.In private messages and email, not on the main page.
cool. you all rock!just curious, what became of the web bots?
+ in reply to...
We are blocking them still. But they are no longer helping to disrupt the site - with a properly functioning application it's not even a nuisance. For all we know we've been getting hit by them for months and it only started to notice once the memory issue came up.
A big thank you to stafftech, Mo and Iz.And yes, I did read the whole post and found it interesting. And yes, I did get the 'we don't like you" message. And I did raise an eyebrow.
yep I saw it, I went humph and hit refresh 'twas goneI'm a litte bit of a techie geek (which is why I'm reading the story) so I knew it was a "bad bot" message and I didn't cry :)LOL @ the Kitten, and good luck in the Cloud, the Sky is the limit now heh ;)
JaxomLOTUS saidWe put a little blurb in place that said "We don't like you" intending it to only show up for them. But instead it showed up for everyone on the site for a few minutes until we caught the problem. Woops.
This is brilliant, by the way. I've been doing some web building on campus, and I wish I could get away with something like that.
If you think there are legit usage for bots but don'e want to have them eat your CPUs, you can always do a "tarpit" where each successive hit from a machine gets slowed down a little more and a little more. Slow enough so that humans don't notice, but bots eventually get stuck in the tar. The web threads for them are in a sleep state, so they don't take up CPU time.I did that for the directory search engine here at CMU, where bots were trying to harvest email addrs for evile spam purposes, and it worked very well. I'd count how many hits (N) I'd gotten from the client's IP address and I'd just sleep for N^2 milliseconds before servicing the request. Then if the user went away for over an hour, reset their N to 0.
Fascinating! I thought it was that!!! :)
cmumiddle said bots were trying to harvest
Sounds like an awesome Sci-Fi movie.
Could I have a dog instead? LOLThanks to all who got the site up and running again!
Many thank's for explaining, it's really appreciated that you are taking the time to doing this after all the work you have had to get things up and running again. A big star from me!, now go home and get some sleep :DEdit:
That's a pretty Kitteh!
Im not a techie by any stretch of the imagination, but I still enjoyed the read, and actually understood most of it!Now...if only I had entered a picture much like the pretty kitty up there, I would have won Round 1 of the HXH Tourny. :(
Oh wow. Give the team a bonus day off. Or at least cookies. Tell them we wuv them.
Your browser does not support iframes.