12 June 2007

Copy of feedback Jeff Dike gave me...

Hello everyone. Recently I wrote about GCC on Onlamp and some folks gave me feedback. I believe this will be a valuable piece for everybody, so I put it on my blog. The same comment is also posted in Onlamp, but I screwed up the HTML output. For those who got trouble reading the text there, I put the corect version here.



This one I got from Jeff. Sharp criticism... The text written in italic is my original text, followed with the comment.



gcc (GNU C Compiler) is actually a collection of frontend tools that



Actually, gcc == GNU Compiler Collection - the whole family is referred to as gcc.



Preprocessing: Producing code that no longer contains directives. Things like "#if" cannot be understood directly by the compiler, so this must be translated into real code. Macros are also expanded at this stage, making the resulting code larger than the original.



It also pulls in headers.



..manipulate them further. This work is done in multipass style, which demonstrates that it sometimes takes more than one scan through the source code to optimize.



It doesn't scan the source - it scans the intermediate format, which used to be RTL, but which is something else now.



...As you may already be aware, registers can be accessed hundreds or thousands times faster than RAM cells.



Exaggeration - Maybe ~100 cycles for going out to main memory, but these things will be in cache, so might cost a few cycles.



0x7530 is 30,000 in decimal form, so we can quickly guess the loop is..



0x7530 is hex, "0x7530 is 30,000 in hexadecimal form" or "0x7530 in decimal is 30,000"



simplified. This code represents the innermost loop and the outermost loop ("for(j=0;j<5000;j++) ... for(k=0;k<4;k++)") because that is literally a request to do 30,000 loops. Note that you just need to...



5000 * 4 = 20000 loops.



Author's note: I admit this is solely my own mistake that confused number of loops with the current value of accumulator (acc variable). The correct sentence should be "this code represents the middle and the innermost loop (for(j=0;j<5000;j++) ... for(k=0;k<4;k++)). In the end of these loops, accumulator is increased by 30,000".



To illustrate them better, here are the codes with inline comments. First check #1, then #2 and so on to understand the flow.



80483a6: jmp 80483c7 <main+0x37>
80483a8: add $0x7530,%ecx 4. acc += 30,000 ?
80483ae: cmp $0x11e1a300,%ecx 5. accumulator has reached 300,000,000 ?
80483b4: je 80483d0 <main+0x40>
80483b6: jmp 80483c7 <main+0x37>
80483b8: add $0x6,%edx 2. the innermost loop.
80483bb: add $0x1,%eax EAX is the counter for middle loop.
80483be: cmp $0x1388,%eax 3. have we loop 5,000 times yet?
80483c3: je 80483a8 <main+0x18>
80483c5: jmp 80483b8 <main+0x28>
80483c7: mov %ecx,%edx 1. starts here.
80483c9: mov $0x0,%eax
80483ce: jmp 80483b8 <main+0x28>
80483d0: mov %edx,0x4(%esp) 6. ready to print.
80483d4: movl $0x80484a0,(%esp)
80483db: call 80482b8 <printf@plt>


So, instead of originally looping 200,000,000 (10,000 * 5,000 * 4) times, it now does 50,000,000 (10,000 * 5,000) times only.



Now, on to parameter passing. In x86 architectures, parameters are pushed to the stack and later popped inside the function for further processing.



Sometimes popped, often they are left on the stack.



By using -mregparm, you basically break the Intel x86-compatible Application Binary Interface (ABI). Therefore, you should mention it when you distribute your software in binary only form.



Why? I see no problem shipping source with Makefiles that say -mregparam. The ABI problem comes if you were to redeclare a library function as regparam and call it.

How to execute multiple commands directly as ssh argument?

 Perhaps sometimes you need to do this: ssh user@10.1.2.3 ls It is easy understand the above: run ls after getting into 10.1.2.3 via ssh. Pi...