- Fotis' Blog - https://fotisl.com/blog -

Inline assembly the gcc way

This is the second time I write a post about inline assembly. The last one was at my old blog which I had to remove. However, since that was the most viewed article I think I should write about this again. Although this is just a small introduction, you must know C and assembly in order to continue. For the moment I will only describe x86 assembly, however the differences for other architectures are minimal. I must also point you to the GCC Inline Assembly HOWTO [1]. I wrote the first post before reading it (well, the post was ready but before publishing it I googled a little bit and found it) and I think it is equally useful.

We are going to start with a little introduction to the AT&T syntax for the assembly for all of you who are familiar only with Intel syntax. Here are some basic rules:

Here are some examples in intel and at&t syntax.

Intel syntax:

This gets translated to:

If you want to see more examples, you can write your own code and disassemble it using the objdump command. By using the -d option you can disassemble any file you want. The default syntax is at&t but if you want to switch to intel you can add -M intel to the command line.

We can now continue to putting inline assembly to our programs. In order to insert assembly code we must use the “asm” construct.

This will execute a simple nop instruction. In case asm conflicts with something else in your program you can also use __asm__. Apart from this there is also a more advanced format.

The assembly template is your actual assembly code with some special operands that will be replaced by registers or memory locations. This code will be send directly to the assembler, so if you want to run more than one command you must use one of the following constructs:



The last one uses ; which is the line separator at most architectures. Since we are going to use only x86 we are free to use it, however, at other architectures, like H8/300, ; is the comment character. You can check the gas info page for more information about the line separator character.

The output and input operands are two lists the have the form

"constraint"(variable), "constraint"(variable), ...

At the output operands a constraint denotes what will be placed at variable after the execution of the assembly code and at the input operands a constraint denotes where a variable will be placed. The various constraints for the x86 architecture are:

The memory operand will not use a register, it will only pass the location of variable in memory. Furthermore, the matching operand is used when a register is used both as input and as output, e.g. by specifying 0 as an input operand, the first output register will be used for input. You must prefix the output constraints with the = character. If you use the r constraint more than once, the same register may be assigned. If you want to choose a different register, you should use the & character, i.e. the constraint &r.

Finally, the list of clobbered registers is a list of all registers modified by your program. These registers must have the format %eax, %ebx etc.

If your code must execute exact;y at the place you put it, you must put the volatile keyword after asm or __volatile__ after __asm__. Otherwise, the compiler may change its position if optimization takes place.

I will present some simple examples.

At the first example I will read the timestamp counter. This is done using the rdtsc command which returns the result at eax and edx.

The generated code will be:

-16(%ebp) and -12(%ebp) are the two words that hold the unsigned long long tsc that will get the timestamp counter.

The second example will read and print some cpuid information.

I will only present the second part of the program which prints the processor brand.

Location -32(%ebp) stores the op variable. At the beginning it is assigned the value 0x80000002. I will not comment the code generated for the loop, it is pretty simple. After the .L3 label the inline assembly begins. At the beginning, the op value is moved to the register %eax since it is in the input operands list with the constraint a. The cpuid instruction is then executed since it is in the assembly template. Finally, all registers at the output operands list are saved to the appropriate location, i.e. -28(%ebp) for %eax, -24(%ebp) for %ebx, -20(%ebp) for %ecx and -16(%ebp) for %edx.

You can find more information by writing your own programs and disassembling them using the objdump utility.

If you like this post I will continue with something more advanced.