15 October 2009

Misunderstanding of rate limit concept in iptables

One day, I decided to learn more about iptables. This time, I found "limit" feature quite interesting to try. An excerpt from iptables manual page:

"limit

This module matches at a limited rate using a token bucket filter. A
rule using this extension will match until this limit is reached
(unless the ‘!’ flag is used). It can be used in combination with the
LOG target to give limited logging, for example."

OK, can't wait to get my hand dirty on it. I fired up my VM (Virtual Machine) guest and type this in guest's console:
# iptables -A INPUT -i eth0 -m limit -m icmp --icmp-type echo-request --limit 1/min -j RETURN

To avoid confusion, I assume the default policy of INPUT chain is ACCEPT. Further, there is no other rule in INPUT chain other than what I typed above.

What is the above rule supposed to do? My understanding, at that point, that it will rate limit the ICMP echo request packets up to 1 packet per minute. Thus, only 1 packet during 1 minute interval will be processed. Further packets will be queued in memory awaiting to be processed. My intention to try this feature is simply to find idea to prevent DDoS, but my "assumption of queueing" made me think that this is not really safe. If there 1 million packets waiting to be processed, eventually your machine's memory will be exhausted, no?

But OK, let's put aside that fearness. I flood ping my VM guest (ping -f, if you don't know how to do it. You have to be root to do this). But guess what? 100% of all ICMP packets are responded really fast!!! What's wrong?

Then I did various test. Replace RETURN target with DROP, not using -i, not specificly rate limiting echo-request, etc. Nothing works! tcpdump still showed me that there were lots of echo request - echo reply packets flowing back and forth between my host and my VM guest.

I almost concluded that "limit" was not working as I thought. Perhaps this is a job of iproute tools, something like we do to rate limit packets using CBQ, HTB etc.

But then I smell something fishy:
# iptables -L -n -v | head

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)



pkts bytes target prot opt in out source destination
0 0 RETURN icmp -- * * 0.0.0.0/0 0.0.0.0/0
limit: avg 1/min burst 5 icmp type 8


Notice the "pkts" field? It's a counter that denotes how many packets are entering certain rule. Also notice there is global counter displayed in INPUT chain (inside the bracket).

So, what's special with them? When I repeat my flood ping test, I saw both the limit rule and the global INPUT counter increased! Thus, something is wrong in my assumption. If indeed the packets were successfully rate limited, at least ACCEPT counter won't be increased as fast as rate limit counter grew.

The answer? Back on manual page. Looks like my English skill was really tested this time. "A rule using this extension will match until this limit is reached". Uhuh...I see...

Confused? Let me explain it as simple as I can:
Assume you use 1 packet per second as limit. During the first minute interval, if a packet arrives, it will hit INPUT chain and checked against rate limit rule. Does it match? Of course! It is still not beyond our limit, right? How about the 2nd, 3rd, 4th and the 5th? They will match too. Why? Because by default, there is burst limit. It will allow several initial packets to get a match, but not all. The default is 5.

What about the rest? For sure, they won't match our limit rule. Again, why? because the limit has been reached (as stated by manual page), thus the limit rule is passed and netfilter will check the next rule. And since there is no more rule in our scenario and the default is to accept in INPUT chain, then all ICMP request packets are accepted and replied!

Solution? Simple. Since the excessive packets will pass our limit rule, then we need to block them right at the next rule e.g:
# iptables -A INPUT -p icmp -m icmp --icmp-type echo-request -j DROP

Voila! We successfully rate limit the ICMP! Woohoo! Case closed.... phewwww

Lesson taken: do not underestimate manual page. Read it very very carefully and make sure you understand every word in it. Misunderstanding of the meaning even a single word could lead to significant difference between successful or stressful trial-and-error implementation. You've been warned...

regards,

Mulyadi

01 October 2009

A little patch that made into main Linux kernel git repository

Commit-ID: 1ad0560e8cdb6d5b381220dc2da187691b5ce124


Gitweb: http://git.kernel.org/tip/1ad0560e8cdb6d5b381220dc2da187691b5ce124
Author: Mulyadi Santosa <mulyadi.santosa@gmail.com>

AuthorDate: Sat, 26 Sep 2009 02:01:41 +0700
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 1 Oct 2009 10:12:03 +0200


perf tools: Run generate-cmdlist.sh properly


Right now generate-cmdlist.sh is not executable, so we
should call it as an argument ".".


This fixes cases where due to different umask defaults
the generate-cmdlist.sh script is not executable in
a kernel tree checkout.


Signed-off-by: Mulyadi Santosa <mulyadi.santosa@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>

Cc: Paul Mackerras <paulus@samba.org>

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <f284c33d0909251201w422e9687x8cd3a784e85adf7d@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---

tools/perf/Makefile | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index b5f1953..5881943 100644

--- a/tools/perf/Makefile

+++ b/tools/perf/Makefile

@@ -728,7 +728,7 @@ $(BUILT_INS): perf$X
common-cmds.h: util/generate-cmdlist.sh command-list.txt

common-cmds.h: $(wildcard Documentation/perf-*.txt)


- $(QUIET_GEN)util/generate-cmdlist.sh > $@+ && mv $@+ $@
+ $(QUIET_GEN). util/generate-cmdlist.sh > $@+ && mv $@+ $@


$(patsubst %.sh,%,$(SCRIPT_SH)) : % : %.sh

$(QUIET_GEN)$(RM) $@ $@+ && \

How to execute multiple commands directly as ssh argument?

 Perhaps sometimes you need to do this: ssh user@10.1.2.3 ls It is easy understand the above: run ls after getting into 10.1.2.3 via ssh. Pi...