Data initialization in C++

In this blog post, I am going to review the different kind of data and how they are initialized in a program.

What I am going to explain here is valid for Linux and GCC.

Code Example

I'll just start by showing a small piece of code. What is going to interest us is where the data will end up in memory and how it is initialized.

const char string_data[] = "hello world"; // .rodata
const int even_numbers[] = { 0*2 , 1*2,  2*2,  3*2, 4*2}; //.rodata

int all_numbers[] = { 0, 1, 2, 3, 4 };  //.data

static inline int odd(int n) { return n*2 + 1; }
const int odd_numbers[] = { odd(0), odd(1), odd(2), odd(3), odd(4) }; //initialized

QString qstring_data("hello QString"); //object with constructor and destructor

I'll analyze the assembly. It has been generated with the following command, then re-formatted for better presentation in this blog post.

g++ -O2 -S data.cpp

(I also had to add a function that uses the data in order to avoid that the compiler removes some arrays that were not used.)

The sections

On Linux, the binaries (program or libraries) are stored as file in the ELF format. Those files are composed of many sections. I'll just go over a few of them:

The code: .text

This section is the actual code of your library or program it contains all the instructions for each function. That part of the code is mapped into memory, and shared between the instances of the processes that uses it (provided the library is compiled as position independent, which is usually the case).

I am not interested in the code in this blog post, let us move to the data sections.

The read-only data: .rodata

This section will be loaded the same way as the .text section is loaded. It will also be shared between processes.

It contains the arrays that are marked as const such as string_data and even_numbers.

.section    .rodata
    .string "hello world"
    .long   0
    .long   2
    .long   4
    .long   6
    .long   8

You can see that even if the even_numbers array was initialized with multiplications, the compiler was able to optimize and generate the array at compile time.

The _ZL11 that is part of the name is the mangling because it is const.

Writable data: .data

The data section will contain the pre-initialized data that are not read-only.
This section is not shared between processes but copied for each instance of processes that uses it. (Actually, with the copy-on-write optimization in the kernel, it might need to be copied only if the data changes.)

There goes our all_number array that has not been declared as const.

    .long   0
    .long   1
    .long   2
    .long   3
    .long   4

Initialized at run-time: .bss + .ctors

The compiler was not able to optimize the calls to odd(), it has to be computed at run-time. Where will our odd_numbers array be stored?

What will happen is that it will not be stored in the binary, but some space will be reserved in the .bss section. That section is just some memory which is allocated to each process, it is initialized to 0.

The binary also contains a section with code that is going to be executed before main() is being called.

.section    .text.startup
    movl    $1, _ZL11odd_numbers(%rip)
    movl    $3, _ZL11odd_numbers+4(%rip)
    movl    $5, _ZL11odd_numbers+8(%rip)
    movl    $7, _ZL11odd_numbers+12(%rip)
    movl    $9, _ZL11odd_numbers+16(%rip)

.section    .ctors,"aw",@progbits
    .quad   _GLOBAL__sub_I_odd_numbers

.local  _ZL11odd_numbers  ; reserve 20 bytes in the .bss section
    .comm   _ZL11odd_numbers,20,16

The .ctor section contains a table of pointers to functions that are going to be called by the loader before it calls main(). In our case, there is only one, the code that initializes the odd_numbers array.

Global Object

How about our QString? It is a global C++ object with a constructor and destructor. It is simply initialized by running the constructor at start-up.

.section    .rodata.str1.1,"aMS",@progbits,1
    .string "hello QString"

.section    .text.startup,"ax",@progbits
       ; QString constructor (inlined)
    movl    $-1, %esi
    movl    $.LC0, %edi
    call    _ZN7QString16fromAscii_helperEPKci
    movq    %rax, _ZL12qstring_data(%rip)
       ; register the destructor
    movl    $__dso_handle, %edx
    movl    $_ZL12qstring_data, %esi
    movl    $_ZN7QStringD1Ev, %edi
    jmp __cxa_atexit   ; (tail call)

Here is the code of the constructor, which have been inlined.

We can also see that the code calls the function __cxa_atexit with the parameters $_ZL12qstring_data and $_ZN7QStringD1Ev Which are respectively the address of the QString object, and a function pointer of the QString destructor. In other words, this code registers the destructor of QString to be run on exit.
The third parameter $__dso_handle is a handle to this dynamic shared object (used to run the destructor when a plugin is unloaded for example).

What is the problem with global objects with constructor?

  • The order in which the constructors are called are not specified by the C++ standard. If you have dependencies between your global objects, you will run into trouble.
  • All the constructors of all the global in all the libraries need to be run before main() and slow down the startup of the application. (Even for objects that will never be used).

This is why it is not recommended to have global objects in libraries. Instead, one can use function static objects, which are initialized on the first use. (Qt provides a macro for that: Q_GLOBAL_STATIC which is made public in Qt 5.1.)

Here comes C++11

C++11 comes with a new feature: constexpr

That keyword can be used in two ways: If you specify that a function is a constexpr it means that the function can be run at compile-time.
If you specify that a variable is a constexpr, then it means it can be computed at compile time.

Let us slightly modify the example above and see what it does:

static inline constexpr int odd(int n) { return n*2 + 1; }
constexpr int odd_numbers[] = { odd(0), odd(1), odd(2), odd(3), odd(4) };

Two constexpr were added.

.section    .rodata
    .long   1
    .long   3
    .long   5
    .long   7
    .long   9

Now they are generated at compile time.

If a class has a constructor that is declared as constexpr and has no destructor, you can have this as global object and it will be initialized at compile time.

Since Qt 4.8, there is a macro Q_DECL_CONSTEXPR which expands to constexpr if the compiler supports it, or to nothing otherwise.

Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!

If you like this blog and want to read similar articles, consider subscribing via our RSS feed (Via Google Feedburner, Privacy Policy), by e-mail (Via Google Feedburner, Privacy Policy) or follow us on twitter or add us on G+.

Submit on reddit Submit on reddit Tweet about it Share on Facebook Post on Google+

Article posted by Olivier Goffart on 16 May 2013

Load Comments...
Loading comments embeds an external widget from
Check disqus privacy policy for more information.
Get notified when we post a new interesting article!

Click to subscribe via RSS or e-mail on Google Feedburner. (external service).

Click for the privacy policy of Google Feedburner.
© 2018 Woboq GmbH Google Analytics Tracking Opt-Out