08 July 2012

Fixing segfault in Pathload

Dear readers

Couple days ago (it's 1st week of July 2012), I came across this nifty tool called Pathload. Essentially, it helps you determine the real upstream and downstream connection of yours.

Inaccidentally, when I tried to run it, it segfaulted immediately. With a bit help of gdb and trial/errors, I found and fix the bug. Here's the complete email message that I sent to its maintainer (which is I find no longer maintain it anymore) describing the problem. For those who just seek for the patch, just scroll to the end of this post (normal patch format):

Dear Constantinos

I came across this nice tool Pathload of yours today while exploring
about network management in Linux kernel. Of course, quickly I
downloaded the link to the source tarball (I use Linux -- Centos 5.x)
and compiled it.

When running it, it suddenly stopped due to segfault. After checking
the stack trace in the resulting core dump image, it leads to line 132
in client.c:

My intuition suddenly told me it must out of bound char copy. Short
story short, I landed to client.h at this line:
#define MAXDATASIZE 25 // max number of bytes we can get at once

I did quick test and change the above line. Now it reads
#define MAXDATASIZE 50

I do "make clean" followed by "make". Now it runs perfectly fine as
far as I can tell.

Hopefully it means something to you for upcoming release. Meanwhile,
once again thank you for this piece of good work.

PS: strangely, without modifying any single line of source code, the
resulting binary worked fine inside GNU debugger (gdb). That's why I
suspected a race condition initially.

--- client.h.old    2012-07-07 11:10:54.000000000 +0700
+++ client.h    2012-07-07 11:10:37.000000000 +0700
@@ -62,7 +62,7 @@
 #define UNCL    4

 #define SELECTPORT 55000 // the port client will be connecting to
-#define MAXDATASIZE 25 // max number of bytes we can get at once
+#define MAXDATASIZE 50 // max number of bytes we can get at once

 EXTERN int send_fleet() ;

30 March 2012

"useradd" and "adduser" are the same? think again....

Well, actually they are not that different. Only small not-so-obvious-but-a-bit-bothering fact.

I did this in Ubuntu Natty (11.04):
sudo useradd -m user_a

and next:
sudo adduser -m user_b

Of course I put password on both of them, let's say "123456" (weak one, I know :) ). And then, if I did:
su - user_a
I got:
Just plain dollar sign. "Uhm, what's wrong?".

But, if I did:
su - user_b
I got:
user_b@localhost $

Grrrr.... I quickly concluded that something is different in their bash initialization. So a quick:

sudo diff -Naur /home/user_a/ /home/user_b/
should pin point the difference if there are any, right away. But I was wrong. They were exactly identical.

Then I decided to take a peek at /etc/passwd. No strong reason though, just plain curiousity:
grep -E 'user_a|user_b' /etc/passwd
the result:
[The passwd entries are shortened to focus on the important fields only]

Great! We found it! "But wait, isn't that /bin/sh a symbolic link to /bin/bash?". Well yes, at least sometimes ago. But recently, at least on latest releases of Ubuntu and its derivatives, /bin/sh is now pointing to "dash".

Dash is a "bash" alike shell but with smaller file size and fewer capability, which result to incompabilities with Bash in many aspects. So, no wonder that ".bashrc" didn't initialize the shell prompt along with other thing (enabling Tab completion, IIRC) correctly.

Therefore, to fix the useradd behaviour, simply use:
sudo useradd -s /bin/bash user_a

Another case closed, folks :)

PS: It's really a wonder how much you can do with grep and diff, if you know where to look ..... :D



Mulyadi Santosa

25 January 2012

" not found"? here we go again...

Hi all...

I've been tinkering with Linux Mint for the last month, so my CentOS installation was kinda abandoned. However, I took my chance to update CentOS via the usual chroot trick. It works.... however...

I found a glitch. I was aware of it when I ran my self-made wifi connection script which calls dhclient program. It said: not found

Great...ldd said the same thing too. However, is still in /lib/, so it's not really missing. Hmmmm...

As a important note: recent update shows that there is another which reside in /lib/i686/nosegneg. From random googling, I concluded that it is a "Xen friendly" library. It's a short way to describe that those libraries are not using certain segmentation techniques that might confuse or break Xen, so to speak.

Then, somehow I felt that it *might* be related to SELinux (i make it enforcing). Here are few lines from /var/log/messages that shows such quirk:
kernel: [    5.195941] type=1400 audit(1327499418.190:3):
 avc:  denied  { read } for  pid=860 comm="restorecon" name="" dev=xxxx ino=4821369 scontext=system_u:system_r:restorecon_t:s0 tcontext=system_u:object
_r:file_t:s0 tclass=lnk_file

and the output of "ls" is:
$ ls -lZ /lib/
lrwxrwxrwx  root root system_u:object_r:file_t          /lib/ ->
(the above output might be slightly incorrect, just focus on "file_t" attribute)

Alright, so SELinux attribute of is wrong. I didn't know what exactly causing that during the chroot session. My best guess is that since it was done inside Linux Mint, which in turn doesn't use SELinux, partial relabeling or anything related to fix SELinux attribute simply fails.

The fix is fortunately easy:
1. edit /etc/sysconfig/selinux. change "SELINUX=enforcing" into "SELINUX=permissive"
2. do "sudo touch /.autorelabel". Notice the . (dot) prefix.
3. reboot

SELinux will relabel everything inside your mounted filesystem according to its default configuration once Linux enters normal runlevel.

To confirm your problem is gone, pick random binary, say dhclient and run ldd. Here's mine:
$  ldd /sbin/dhclient => /lib/i686/nosegneg/

And problem is solved :) Now you can turn SELinux back into enforcing mode.

PS: SELinux is both fun and frustating..... but with careful log analysis, usually you can pinpoint the root of the problem pretty fast.


Mulyadi Santosa