09 November 2009

how to catch white space(s) using grep?

My definition: White space characters are anything that appears as "blank" a.k.a nothing in screen. They include tab, space, carriage return and so on.

As you know, grep provides a way to catch certain characters class or range. Specifically for white spaces, you can use [[:space:]] or [[:blank:]]. Notice the double [[ and ]] !!!

So, suppose you have text file named test.txt that contains:

     hehe /var/www/
hehe /var/www2/
ttt hehe /var/www3/
     heho /var/www/


$ grep ^hehe test.txt

will yield:

hehe /var/www2/

but this:

$ grep -E '^[[:blank:]]*hehe' test.txt


       hehe /var/www/
hehe /var/www2/

In human words, '[[:blank:]]*' will catch zero or more appearance of space or tab before the word "hehe". If you want to catch at least single appearance of any of them, use "+" instead. Oh and let me remind you again, use -E so that "+" doesn't lose its special meaning.

Note: initially, i thought i simply use [:space:] or [:blank:] and end up in something-is-wrong-but-I-dont-know-what land. Turns out, I didn't read the man page carefully (poor me). Since they are built-in classes, I still need to enclose them with another "[" and "]". Valuable experience.....



