Symbols and Symbols File February 13, 2020 on tomleb's blog

Exploring the wlroots project, I came across the file wlroots.syms which has an extension I had not seen before in C projects. This file is a version script and it is used by the linker to determine what will be exported by a dynamic library.

The example below shows what a version script, or symbol file, looks like. It is taken from the wlroots project.

WLROOTS_0_0_0 {
	global:
		wlr_*;
		_wlr_log;
		_wlr_vlog;
		_wlr_strip_path;
	local:
		wlr_signal_emit_safe;
		*;
};

Before going over its syntax and its meaning, we will explain what symbols are and how visibility affect the use of symbols in a code base.

Symbols

A symbol is any user-made identifier that can be referenced when programming.1 For example, functions, global variables and constants are symbols.

There are a few rules that restrict when symbols can be referenced in a source code file. It depends on the visibility of the symbol, which can be either global (external) or local (internal). But external to what ?

When compiling a C program, each *.c files are compiled into corresponding *.o object files. Then, these files are linked together by the linker to create an executable or a dynamic library….

Importance of Symbol Visibility

When creating a shared library, we want to export only the symbols that should be used by the users of the library.

This is similar to encapsulation in OOP where the public data and methods form the application programming interface (API) of a class. This makes it easy for developers of a class to change the underlying implementation without breaking users of the class.

For a shared library, the interface that is exposed to the users is called a application binary interface (ABI). By hiding symbols, we avoid future possible breakage to users of our library.

Another reason to hide symbols is to avoid conflict with other libraries’ symbols. If two dynamically linked library export the same symbols, only one of them will be available during runtime.

Finally, from Controlling Symbol Visibility, hiding symbols also reduces the size of the dynamic symbol table. This is supposed to improve the startup time, but I was not able to find much information. In the wlroots pull request #647, we see that the startup time went from 42,604,509 cycles to 27,169,614 cycles which according to ddevault, would save about 0.001 seconds.2

Visibility for Object Files

We can control the visibility of object files by using the static keyword. All top level symbols are global by default. A symbol declared static is local, and can only be accessed within that object file.

For example, let’s write a simple file foo.c with the following content.

static void foo_local(void) {
	// do something
}

void foo_global(void) {
	// do something
}

We can then compile this file into an object file. This creates the file foo.o.

$ cc -c foo.c

Finally, we can list the symbols that the object file contains using the nm command line utility.

$ nm test.o
0000000000000007 T foo_global
0000000000000000 t foo_local

Disregarding the first column, the second column tells you the symbol type and the third column, the symbol name. The symbol is global when its type is an uppercase letter and local when it is lowercase.

In this case, foo_global is external and foo_local is internal. This means that foo_local can only be used in the foo.c file. For example, create a file named main.c with the following content.

extern void foo_global(void);
extern void foo_local(void);

int main(int argc, char *argv[]) {
	foo_global();
	foo_local();
}

If we try to compile it, and link the files to make an executable, it will fail because we are trying to reference a symbol that is not external. However, you can remove the lines referring to foo_local, follow the same compilation step as below and the program will successfully compile.

$ cc -c main.c
$ cc main.o test.o -o main
/usr/bin/ld: main.o: in function `main':
main.c:(.text+0x15): undefined reference to `foo_local'
collect2: error: ld returned 1 exit status

Sometimes object level visibility is not enough and that is where being able to hide symbols comes in handy.

Visibility for Libraries

When developing, it happens often that we create code that should be used internally only. To make code easier to maintain, the internal and non-internal code will be split into different files. Thus, we will end up with multiple object files.

We link all of the object files together to create a library. The library will export all the external symbols of the object files. This means that we will have external symbols of internal code. This breaks encapsulation because we are making the internals of our library available to the users.

Let’s consider a simple example: we want to create a library called foo. The code is split across foo.h, foo.c, bar.h and bar.c where the bar* files are used and must be used only by foo*.

// foo.h
void foo(void);

// foo.c
#include "bar.h"
#include "foo.h"

void foo(void) {
	bar();
}

// bar.h
void bar(void);

// bar.c
#include "bar.h"

void bar(void) {
	// do something
}

We can compile a shared library with the follow commands.

$ cc -c -fPIC bar.c
$ cc -c -fPIC foo.c
$ cc -shared bar.o foo.o -o foo.so

Again, we look at the symbols with the nm command.

$ nm foo.so
0000000000001115 T bar
...
0000000000001109 T foo
...

As expected, the symbol foo is exported as a global symbol. The interesting thing is that the symbol bar is also exported as a global symbol. As you can see, the C language by itself does not allow us to control the visibility at the library level. This is where we need to reach to the version script seen above with the help of the linker.3

Version Scripts

A version script is a file that describes what should be exported. When creating a dynamic library, we can ask the linker to control the visibility with the version script file.

Following the last example, here is a version script that keeps foo as global, but hides all symbols with a prefix of bar.

Create the file foo.syms with the following content.

VERS_1.1 {
	global:
		foo;
	local:
		bar*;
};

Next, we need to tell the linker to create the dynamic library with the version script foo.syms. Here is the command to achieve this.

$ cc -shared bar.o foo.o -Wl,--version-script,foo.syms -o foo.so

We look again at the symbols of foo.so.

$ nm foo.so
00000000000010f5 t bar
...
00000000000010e9 T foo
...

Finally, bar is a local symbol and cannot be directly referenced by source code file that uses the library. This mean the foo library properly hides its internal, making it easier to change the implementation later on.

Meson

To add a version script to an existing project built with meson, you need to add a link_args directive to the shared_library configuration.

For example, here a short snippet building a library with the symbols file foo.syms.

...
symbols_file = 'foo.syms'
symbols_flag = '-Wl,--version-script,@0@/@1@'.format(meson.current_source_dir(), symbols_file)

foo_lib = shared_library(
  'foo',
  files(
    'src/bar.c',
    'src/foo.c',
  ),
  include_directories: ['include'],
  link_args : symbols_flag,
)

For a more complete example, you can look at the meson_version_script repository.

Versioning

As you can see in the version script file at the beginning of the blog post, the first thing in the file is a word with a version number. This version number can be used to make version script depends on other version script. I am more interested in the symbol visibility, so I will not go into detail about this. For more information, see Version Script.

  1. Close enough 

  2. On his 3.5GHz computer 

  3. There is also the __attribute__((visibility("hidden"))) attribute that can be used. However, this depends on GNU extensions and thus is not portable. 

Have a comment on one of my posts? Start a discussion in my public inbox by sending an email to ~tomleb/public-inbox@lists.sr.ht [mailing list etiquette]