Back from the Dead: Simple Bash for complex DdoS

If you work for a company with an online presence long enough, you'll deal with it eventually. Someone, out of malice, boredom, pathology, or some combination of all three, will target your company's online presence and resources for attack. If you are lucky, it will be a run of the mill Denial of Service (DoS) attack from a single or limited range of IP addresses that can be easily blocked at your outermost point, and the responsible parties will lack the necessary expertise to overcome this relatively simple countermeasure. Your usual script kiddie attack against a site with competent network and server administration is fairly short. If you are unlucky, you'll experience something worse: A small percentage of attacks is from a higher caliber of black hat, and while more difficult to deal with, the individual generally bores easily and moves on.

If you are very, very unlucky, someone highly skilled and just as determined will decide to have some fun with you. If this person decides they want to crack their way into your servers and explore your environment, eventually they will get in and their isn't too much you can do about it. As long as they don't do anything too obvious, like launch a huge dictionary crack attack against other sites from your servers, you may never know, even if you are pretty good and attentive. And if they decide they want to knock you off of the Internet, then down you go.

I had the misfortune to be on the receiving end of such an attack at a previous employer who shall remain nameless (but it was in 2007 and my linkedin is public: http://www.linkedin.com/in/gregbledsoe). Someone didn't seem to like us very much and decided to erase us from online existence. At first it was a standard DoS syn-flood that any script-kiddie could launch, a minor annoyance at best, easily mitigated by blocking the source IP at the point of Ingress. Then it got interesting.

The attacker adapted by engaging a substantial bot-net and it became a distributed denial of service (DdoS) attack. The targeted server address was down briefly until we engaged our carriers to block the inbound attack further out. Still, at that point, the crisis is over, right? Normally, yes. In this case? Not even close.

The attacker adapted the attack *again*, this time seeming to rotate through connections from real bot-net systems and also sending oodles of fake connection requests from random spoofed IP addresses. All told, the number of incoming connection requests was close to a million at a time. This took us down hard. Panic ensued, and after some quick brainstorming a number of mitigation techniques were attempted, all to no avail. The connections went through our firewall, through our load balancer, and hit one of three back-end systems, all of which were overwhelmed dealing with the load imposed by the attack. We tried using rate-limiting on the firewall, and while I'm not sure exactly what they implemented, this took everything behind the firewall down, not just the the targeted URL/server address. The rate limiting statements were taken back out of the configuration but everything stayed down. We discovered that the firewall equipment was out of memory, creating table space to keep track of all the connection attempts. It couldn't tell the difference between spoofed, real, and legitimate tcp SYN connection requests, so it tracked them all and let them through. Apparently the particular equipment we had did not allow more granular rate limiting. Options were discussed, including rejiggering our DNS to send all our traffic through a (very expensive) company that promised to scrub the attack before it reached us. I was skeptical of this idea.

Being the Unix Guy, my domain was the backend servers and to a lesser extent, the load balancer. After watching the output of netstats, lsof -ni's, and tcpdumps for a while, I knew how to defeat this attack. I spent about 10 minutes crafting my counter measure and deployed it on all three back end servers and within seconds our environment was alive again. The red of nagios alarms cleared within a few minutes and our phones stopped ringing. Our total downtime was about an hour.

The thing that I noticed that made this counter measure work was that there was a clear threshold between the number of connections opened by legitimate users, and the high number of connections from both the real and spoofed IPs that were part of the attack. By identifying them on the back-end servers and sending TCP resets (with the RST flag on) back on all those connection requests over the threshold, we could clear out the connection information on the server, the load balancer, and the firewall and free up the memory that had been used to store that entry in the table - clear out enough of them quickly enough, faster than new attack IPs were coming in, and life became good again.

Here is the (very simple) script I ran on all three servers.

#! /bin/bash
while [ 1 ] ;
 do
 for ip in `lsof -ni | grep httpd | grep -iv listen | awk '{print $8
}' | cut -d : -f 2 | sort | uniq | sed s/"http->"//` ;
 # the line above gets the list of all connections and connection
attempts, and produces a list of uniq IPs
 # and iterates through the list
  do
    noconns=`lsof -ni | grep $ip | wc -l`;
    # This finds how many connections there are from this particular IP address
    echo $ip : $noconns ;
    if [ "$noconns" -gt "10" ] ;
    # if there are more than 10 connections established or connecting
from this IP
    then
      # echo More;
      # echo `date` "$ip has $noconns connections.  Total connections
to prod spider:  `lsof -ni | grep httpd | grep -iv listen | wc -l`" >>
/var/log/Ddos/Ddos.log
      # to keep track of the IPs uncomment the above two lines and
make sure you can write to the appropriate place
      iptables -I INPUT -s $ip -p tcp -j REJECT --reject-with tcp-reset
      # for these connections, add an iptables statement to send
resets on any packets recieved
    else
        # echo Less;
    fi;
  done
sleep 60
done

Our attacker made a number of attempts to adapt to this solution, trying for instance to have sections of the bot-nets start at some IP, like 1.1.1.1, and send one connection apiece rotating through IPs as quickly as possible to avoid tripping the threshold, but couldn't rotate quickly enough to wreak the same level of havoc as before. This script proved very robust against the rest of his attacks. Some fine-tuning was done, for instance to remove lines after they aged a particular amount, but the essense of the script remained the same.

What I really liked about this solution was the simplicity. I have found that the best solutions are usually the simplest. If you really understand the underlying technology and protocols, then you can often see right through to what underlies a problem, and avoid adding layer after layer of expense and complexity (and corresponding break points) to your environment.

I'm more than willing to release this under the GPL v2. If anyone is interested in incorporating this snippit or concepts into a larger solution for distribution let me know via the email address below.

______________________

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

: instead of [ 1 ]

Anonymous's picture

You could use

while :;

instead of

while [ 1 ];

I really believe the

Panagiotis Papadomitsos's picture

I really believe the important lesson here is that rejecting packets instead of dropping them can help the surrounding network get a hint of what's going on and mitigate the situation, even though dropping the packets may superficially seem more effective (because it does not create any more traffic on an already heavily burdened network, REJECT does).

However, one must pay attention to very large DDoS attacks, where this simple method can fill up the iptables table with lots and lots of rules, adversely affecting CPU and memory usage to the point where the mitigation itself becomes an autoimmune disease. I have yet to see such a case on physical servers whereas OpenVZ containers can easily die because of this, courtesy of the numiptent limit that UBC-based containers heed to.

Another simple script

John T Sharpe's picture

Very good idea.

Here is something similar that uses netstat to count connections, has configurable white list, and configurable number of connections.

http://deflate.medialayer.com/

Good ideas

Greg Bledsoe's picture

Wow. Remarkably similar. Any idea when this was written? (I really want mine to have been first! Come on 2008 or later!)

Reality is most good ideas occur to more than one person at different times... like the debate on who invented the lightbulb first, or calculus...

Bottom line is it works, simple, effective, fairly light-weight. I'll have to take a closer look at deflate, see if I can contribute at all... looks pretty complete though.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Different approaches to the same problem

John Sharpe's picture

Greg, I can see an advantages to your approach also. Sometimes you would not want to go through the setup and the extra complexity when in an emergency situation. It's good and varied tools that help us be efficient at our jobs.

The only real difference

Greg Bledsoe's picture

The only real difference is that I switched to --reject-with-tcp-reset while he uses --DROP. --DROP leaves the record and memory usage in place on stateful network gear for the connections - I would say that is a slight "slight" advantage to my solution in some circumstances, but easy to change in Ddos.sh.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

He did it earlier... :)

ztank's picture

This is reported the ddos.sh file:

echo "DDoS-Deflate version 0.6"
echo "Copyright (C) 2005, Zaf "

Ciao.

Well darn

Greg Bledsoe's picture

I didn't see that in my first cursory look over it, since it isn't at the top of the file.

Then I guess the kudo's are yours Zak. You did it first. Darn you. :-D

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Very Clever, Understanding What Endpoints are being attacked

Bruce Cramer's picture

Hi Greg:

I must say it's rare to find the correct solution to defend against Web DdoS Attacks.

Your Solution is Simple, Effective and Very Clever, KUDO's !!!

To defend against Web DdoS in a Panic/Crisis Mode, Most Folks ultimately get their ISP's involved upstream. ISP's can/will cause blockage to legitimate Business Services, causing hundreds of help desk phone calls, exactly what the Attacker wanted to accomplish.

Preventing the Ddos Attack at the correct Endpoint, Web/Application Servers exposed on Public Internet, is by far the best solution to issue.

Can You Please Share Your Nightly Cron Job that You ran to remove stale Reject Rules ?

I am a Linux/Bash Newbie learning more things everyday. Doe's Your Bash Script run every 60 seconds continuously, until Servers are rebooted ?

Regards,

Bruce C

It *was* 2007

Greg Bledsoe's picture

I didn't save that cron job -- but it really shouldn't be too terribly difficult to replicate. If I can squeak some time out of my day I'll take a stab at it.

I would run the script with a "nohup [command] &" which would only stop if killed specifically by pid or name, or with a reboot. Reboot seems like overkill though. "kill [pid]" should do it.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

May be a filter in application program can do the job

Anonymous's picture

I have written a filter for tomcat, that count parallel connections from the same ip. If the counter reaches a threshold, it shutdown the connection with "shutdown(fd, SHUT_WR)", so that the server will send back a RST. I also took samples for memory usage, if a request is pending and memory is not enough, drop it.

Interesting.

Greg Bledsoe's picture

I'd like to see that, too! That could certainly work in certain circumstances.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Absolutely beautiful

Mauricio López's picture

This is the kind of solution I love to use. I have learned to love bash, it saves a lot of time and money. This script is definitely going to my toolbox.

Thanks!

Greg Bledsoe's picture

And, I concur wholeheartedly!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

2 lines

Alexander Economou's picture

Wouldn't something like :

iptables -I INPUT -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --set
iptables -I INPUT -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --update --seconds 60 --hitcount 10 -j DROP

do the job without the actual script?

-
Alex

Something else occurred to me about this

Greg Bledsoe's picture

These iptables statements will let every IP continue to send *up to* 10 connection requests per minute. That wouldn't really have helped us with the number of IPs being used in the attack - we needed to identify and then reject and clear *all* connection attempts from the "bad" IPs. I looked over the current man page for IP tables and don't see a way to do that without some scripting.

But I appreciate you provoking me to look! Always learning. :-)

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Almost

Greg Bledsoe's picture

But not quite. I've had quite a bit of problems getting --update and -hitcount to work together correctly, or more properly - as I expect it to. Its entirely possible that the issues I encountered are no longer relevant - but I've not tested it recently. Second - what your iptables lines will catch is connection requests in 60 seconds, what the script catches is simultaneous connections outstanding - a slight difference but meaningful, and could, in the right circumstances, make all the difference.

As an aside - DROP isn't what you want in this case. DROP leaves the tracking burden on all the stateful gear between you and the endpoint - which doesn't fix the problem.

Good suggestion though!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Good

djkpengjun's picture

Great minds think alike.

Good way to learn about bash and networking

Ewen's picture

Hi Greg,

Thanks for the article, I plan to work through the script as an exercise to improve my knowledge of bash and networking, particularly the lsof command.

To add my 2 cents, similar to Pablo, if you ever need the script again I believe that it is more efficient to use grep -c rather than piping to wc -l (I think I read that in another LJ article??). Probably a negligible improvement but hey why not? :)

Thanks again,
Ewen

Not a bad point either

Greg Bledsoe's picture

Its entirely possible grep is faster at counting lines - this isn't something I've tested personally - though it seems (uneducated guess alert) that grep is optimized for searching while wc is optimized for counting. I'd suspect wc was more resource efficient - though I could very well be wrong.

Now I will be irresistibly drawn to test it and unable to sleep until I do. Thanks! ;-P

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Sorry, make myself clearer

Ewen's picture

Sorry, I meant not using grep -c as a drop in replacement for wc -l but being used in the example of:

noconns=`lsof -ni | grep $ip | wc -l`;

replace with:

noconns=`lsof -ni | grep -c $ip`;

As grep has already done all of the searching work anyway.

Cheers,
Ewen

This might make an interesting article in itself

Greg Bledsoe's picture

ie, how to get the answer to a "which is faster or more efficient" question when it comes to bash scripting. Using time, I found that, as i suspected, wc is *much much much* faster than grep -c, but that excludes time for subshell spawning that would be involved in piping.

Generally, bash built-ins and one-shot single-purpose commands are way faster at what they do than the big commands that are swiss-army-style utilities like awk, sed, or grep. cut is faster than tr, and tr is faster than sed, and sed is faster than awk, etc. But adding in piping and associated overhead muddies the picture a little.

Maybe I *will* write that article. :-D

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

How did you test?

Ewen's picture

I found that using grep -c was faster than piping to wc. Maybe because of what you mentioned with piping overhead and what not. I agree that wc would be faster than grep, but if you have to use grep anyway to perform the search, may as well just use it to count?

I tried this:

mybigfile.txt is 881M and just created by cat'ing /usr/share/dict files together a bunch of times.

$ time grep -c a mybigfile.txt
50224464

real 0m8.251s
user 0m8.110s
sys 0m0.120s

$ time grep a mybigfile.txt | wc -l
50224464

real 0m10.991s
user 0m11.610s
sys 0m0.320s

So basically what it comes down to is yes that would be an interesting article and I'd like to read it :)

Intruiging

Greg Bledsoe's picture

I wish I had further time to do more testing. I'd be interested to see if versions made a difference, and the complexity of the grep. :-D

I just put it on my list.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I hear ya

Greg Bledsoe's picture

I still need to test though. :-D I suspect it'll be a close call between wc keeping a cumulative count vs grep tracking it on the way through. :-D

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Very nice script, thanks for sharing but...

ztank's picture

...I have a doubt. The rules you added with repeated commands like:

iptables -I INPUT -s $ip -p tcp -j REJECT --reject-with tcp-reset

reset all the connections coming from a pool of IPs you have previously selected as potential attacker IPs (spoofed and non spoofed).

How many IPs are we talking about? you should have some numbers (/var/log/Ddos/Ddos.log).
If this IPs are a lot (especially the spoofed ones), using the iptables rule you are potentially blocking also common users that, after the DDOS is over, are trying to hit your web servers...

Do you delete the iptables rules after a while?

Just asking here because I am very interested at fully understanding your bash script.

Cheers,
ztank

Very good point

Greg Bledsoe's picture

That is an excellent question. Thanks for asking! In fact, I ran a nightly cron job that removed reject rules that hadn't been hit in a certain amount of time. We tuned that out of exactly the concern you raise, blocking actual users, but eventually proclaimed it "good enough" when we only had one complaint over several days from an actual user that couldn't reach us, which we tracked back to an iptable rule.

Again, great question!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

DDOS

Dave Finnerty's picture

We had a DDOS against an asterisk box (It was directly on the net, because we could not convince managers to do it differently.

We used fail2ban and it could not keep up.

We then started blocking entire ranges of the internet, leaving the US IP ranges in the end and letting F2B handle that.

Your script might have worked for us. F2B waits for failed authentication and you are looking at active connections.

Dropping script in my gmail tools folder. Nice idea.

Thanks,

Dave

Great!

Greg Bledsoe's picture

Feel free to use it if you need to Dave. Would appreciate if you feedback any improvements though. :-)

F2B is really for a different kind of problem, more of a crack attempt kind of attack. I've also had bad luck trying to block whole geographical regions, as ISP's have a way of shifting blocks around unpredictably as IPv4 space availability tightens.

Glad to put a new tool in the toolbox!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

A collection of 50 or so of

seemore's picture

A collection of 50 or so of these scenarios, with do-it-yourself replicable code, might make for a very readable book on the subject.

Any book recommendations for how I can get up to understanding all this in the meantime?

~3

Now there's an idea.

Greg Bledsoe's picture

This particular script requires a pretty solid understanding of both basic networking and basic bash scripting. I would suggest start with some resources designed to get someone up to the CCNA level or equivalent (not necessarily cisco focused) and some bash scripting tutorials, like go from:

http://www.freeos.com/guides/lsst/

to:

http://tldp.org/LDP/abs/html/

That should keep you busy for a while!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Great idea that shows the value of havieng experienced people

Pablo's picture

Hi, Greg:

That's a great idea and it shows the value of having experienced people on board when you run into problems.

Regarding the script you could probably use 'uniq -c' instead of 'uniq' to make the count at the same time than the list and avoid having to run lsof so many times.

Kudos to your scripting and networking abilities,
Pablo

That would certainly have been more efficient!

Greg Bledsoe's picture

But under the gun, I just didn't think of it. :-) Thanks for the suggestion! If I ever need to use this again (may it never be!) I'll include your suggestion!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix