Embedding a File in an Executable, aka Hello World, Version 5967

June 12th, 2008 by Mitch Frazier in

Your rating: None Average: 5 (8 votes)

I recently had the need to embed a file in an executable. Since I'm working at the command line with gcc, et al and not with a fancy RAD tool that makes it all happen magically it wasn't immediately obvious to me how to make this happen. A bit of searching on the net found a hack to essentially cat it onto the end of the executable and then decipher where it was based on a bunch of information I didn't want to know about. Seemed like there ought to be a better way...

And there is, it's objcopy to the rescue. objcopy converts object files or executables from one format to another. One of the formats it understands is "binary", which is basicly any file that's not in one of the other formats that it understands. So you've probably envisioned the idea: convert the file that we want to embed into an object file, then it can simply be linked in with the rest of our code.

Let's say we have a file name data.txt that we want to embed in our executable:

  # cat data.txt
  Hello world
To convert this into an object file that we can link with our program we just use objcopy to produce a ".o" file:
  # objcopy --input binary \
            --output elf32-i386 \
            --binary-architecture i386 data.txt data.o
This tells objcopy that our input file is in the "binary" format, that our output file should be in the "elf32-i386" format (object files on the x86). The --binary-architecture option tells objcopy that the output file is meant to "run" on an x86. This is needed so that ld will accept the file for linking with other files for the x86. One would think that specifying the output format as "elf32-i386" would imply this, but it does not.

Now that we have an object file we only need to include it when we run the linker:

  # gcc main.c data.o
When we run the result we get the prayed for output:
  # ./a.out
  Hello world
Of course, I haven't told the whole story yet, nor shown you main.c. When objcopy does the above conversion it adds some "linker" symbols to the converted object file:
   _binary_data_txt_start
   _binary_data_txt_end
After linking, these symbols specify the start and end of the embedded file. The symbol names are formed by prepending _binary_ and appending _start or _end to the file name. If the file name contains any characters that would be invalid in a symbol name they are converted to underscores (eg data.txt becomes data_txt). If you get unresolved names when linking using these symbols, do a hexdump -C on the object file and look at the end of the dump for the names that objcopy chose.

The code to actually use the embedded file should now be reasonably obvious:

#include <stdio.h>

extern char _binary_data_txt_start;
extern char _binary_data_txt_end;

main()
{
    char*  p = &_binary_data_txt_start;

    while ( p != &_binary_data_txt_end ) putchar(*p++);
}
One important and subtle thing to note is that the symbols added to the object file aren't "variables". They don't contain any data, rather, their address is their value. I declare them as type char because it's convenient for this example: the embedded data is character data. However, you could declare them as anything, as int if the data is an array of integers, or as struct foo_bar_t if the data were any array of foo bars. If the embedded data is not uniform, then char is probably the most convenient: take its address and cast the pointer to the proper type as you traverse the data.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
stabu's picture

whoa the version number on

On June 18th, 2008 stabu (not verified) says:

whoa the version number on this article!

for 64bit x86's, use --output elf64-x86-64. The --binary-architecture option need not change, again somewhat unintuitively.

The version number is the version of the "hello world" program, not the article. And could somebody please come up with a new standard first program. If I see "hello world" in one more language I'm gonna spit-up :).

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Anonymous's picture

so much stuff for little problem...

On June 16th, 2008 Anonymous (not verified) says:

man xxd for "xxd -i":

cat input_file | ( echo "unsigned char xxx[] = {"; xxd -i; echo "};" ) > output_file.c

Hans Bezemer's picture

There is another, portable way to do this

On June 13th, 2008 Hans Bezemer (not verified) says:

I was facing exactly the same problem when I wanted to embed 4tH bytecode into an executable. The trick is to convert the file into a C-file that can be compiled properly with any C compiler. 4tH features a program to do that. In essence it works like this: you read the file in binary mode byte by byte and convert those bytes to unsigned characters. A converted file looks like this:

static unit HelloWorld [] = {
  '\x01', '\x02', '\x04', '\x00', '\xff', '\xff', '\xff', '\x7f', '\x04',
  '\x5c', '\x03', '\x08', '\x02', '\x02', '\x02', '\x0d', '\x08', '\x08',
  '\x08', '\x05', '\x08', '\x02', '\x48', '\x65', '\x6c', '\x6c', '\x6f',
  '\x20', '\x77', '\x6f', '\x72', '\x6c', '\x64', '\x21', '\x00', '\xfd'
};

'unit' is equivalent to 'unsigned char'. You can even embed several files like this. IMHO this method is more transparent to both the programmer and the compiler. The source to do this is pretty trivial:

\ 4tH binary to .h file converter - Copyright 2007 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ This file is geared toward the conversion of 4tH HX bytecode.
\ In order to convert other binary files, just change 'unit' to 'char'.
s" static unit " sconstant header      \ declaration header

include lib/argopen.4th                \ use ARG-OPEN word
include lib/ulcase.4th                 \ case conversion

     9 constant /line                  \ number of bytes per line
char ' constant quote                  \ single quote character
char , constant colon                  \ single colon character

/line string line                      \ input buffer

: .char ." '\x" <# # # #> s>lower type quote emit ;
: .char, .char colon emit space ;      ( n --)
: ?c@ dup if 1- chars + c@ else 2drop 0 then ;
: ?char if ?c@ .char else 2drop then ; ( a n f --)
: .header header type 1 args type ."  [] = {" cr ;
: .footer ." };" cr ;                  ( --)
: ?bounds space space over 0<> and if 1- then bounds ;
: read over over accept tuck <> ;      ( a n1 -- a n2 f)
: .line >r 2dup r@ ?bounds ?do i c@ .char, loop r@ ?char cr r> ;
: .lines hex begin line /line read .line until ;
: Usage argn 4 < abort" Usage: bin2h variable file h-file" ;
: OpenFiles Usage input 2 arg-open output 3 arg-open ;
: Convert Openfiles .header .lines .footer close close ;

Convert

Hans Bezemer

As I allued to in my comment reply below about assembler output, you can create C (or assembler) data with standard Linux commands:

#!/bin/bash

if [[ $# -ne 1 ]]; then
    echo "Usage: $0 FILENAME"
    exit 1
fi
file=$1

if [[ ! -f "$file" ]]; then
    echo "File not found: $file"
    exit 1
fi

cname=$file
cname=${cname//-/_}
cname=${cname//./_}

echo "static unsigned char $cname[] = {"
hexdump -v -e '" " 16/1 "  0x%02x, " "\n"' $file | \
   sed -e '$s/0x  ,//g'
echo "};"

Using objcopy does this without the extra compilation step, although using the result is a bit more obscure. The other thing I like about using objcopy is that it doesn't leave a "temporary" ".c" file sitting around. Makes me nervous deleting ".c" files.

PS Try this, the hexdump command looks freaky but it actually does work!

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Anonymous's picture

That is one of the most

On June 13th, 2008 Anonymous (not verified) says:

That is one of the most interesting things I have ever seen in this magazine. It's almost an introduction to how a linker works. It would be really excellent to expand upon this article, although I'm not expert enough to suggest in what way.

Thanks.

Sander's picture

Use reswrap instead

On June 12th, 2008 Sander (not verified) says:

Or you just use a utility called reswrap which can convert any file into c/c++ data arrays. More portable and lot easier to use.
It's part of the fox toolkit. (www.fox-toolkit.org):


Usage: reswrap [options] [-o[a] outfile] files...
Convert files containing images, text, or binary data into C/C++ data arrays.

Options:
-o[a] outfile Output [append] to outfile instead of stdout
-h Print help
-v Print version number
-d Output as decimal
-m Read files with MS-DOS mode (default is binary)
-x Output as hex (default)
-t[a] Output as [ascii] text string
-e Generate external reference declaration
-i Build an include file
-k Keep extension, separated by underscore
-s Suppress header in output file
-p prefix Place prefix in front of names of declarations and definitions
-n namespace Place declarations and definitions inside given namespace
-c cols Change number of columns in output to cols
-u Force unsigned char even for text mode
-z Output size in declarations

Each file may be preceded by the following extra option:
-r name Override resource name of following resource file

Mitch Frazier's picture

How about assembler?

On June 12th, 2008 Mitch Frazier says:

This will convert to assembly on stdout:
  echo ' .global data_txt'
  echo 'data_txt:'
  hexdump -v -e '" .byte " 16/1 "  0x%02x, " "\n"' data.txt | \
      sed -e '$s/0x  ,//g' -e 's/, *$//'
  echo ' .end'
__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

mkourinny's picture

Ehhh...

On June 19th, 2008 mkourinny says:


.globl data_begin
.data
data_begin:
.incbin "data.txt"
.globl data_end
data_end:

__________________________

Good luck to us,
Mikhail Kourinny

Anonymous's picture

Hi mkourinny & Mitch

On June 19th, 2008 Anonymous (not verified) says:

Hi mkourinny & Mitch Frazier,

Both of ur scripts mentioned above for assembly
give the same output.

But I don't understand what does "Converting to
assembly mean". Sorry if it sounds silly. I guess
its converting an assembly file (.s) to hex bytes.

Thanks,
Ram

Mitch Frazier's picture

Not Quite

On June 19th, 2008 Mitch Frazier says:

Its converting a data file, of any type of data, into text that is valid assembly language. The resulting output could then be passed to the assembler and "assembled" (ie compiled by the assembler) into an object file.

Some of the other comments mention converting it to C and then compiling the C, this is the same idea only the target language is assembly language and not C.

The linux assembler is a program invoked with the command "as", it is sometimes referred to as "gas" for the GNU Assembler.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Ram's picture

Thank u much. :) Sorry for

On June 19th, 2008 Ram (not verified) says:

Thank u much. :)
Sorry for posting many times.
It happened without my knowledge.

Ram

Didn't know that!

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

July 2009, #183

News Flash: Linux Kernel 3.0 to include an on-the-go Expresso machine interface! Ok, maybe not, but Linux is definitely going mobile, from phones to e-readers. Find out more inside about Android, the Kindle 2, the Western Digital MyBook II, The Bug, and Indamixx (a portable recording studio). And if you've gone mobile and you been wanting more Emacs in your life then check out Conkeror.


To compliment the mobile we've got the stationary: parsing command line options with getopt, checking your Ruby code with metric_fu, and building a secure Squid proxy. How is this stationary you ask? What can we say? It's not. We just wanted to see if anybody actually read this part of the page :) .


All this and more, and all you have to do is get your hot sweaty hands on the latest copy of Linux Journal.





Read this issue