Embedding a File in an Executable, aka Hello World, Version 5967
June 12th, 2008 by Mitch Frazier in
I recently had the need to embed a file in an executable. Since I'm working at the command line with gcc, et al and not with a fancy RAD tool that makes it all happen magically it wasn't immediately obvious to me how to make this happen. A bit of searching on the net found a hack to essentially cat it onto the end of the executable and then decipher where it was based on a bunch of information I didn't want to know about. Seemed like there ought to be a better way...
And there is, it's objcopy to the rescue. objcopy converts object files or executables from one format to another. One of the formats it understands is "binary", which is basicly any file that's not in one of the other formats that it understands. So you've probably envisioned the idea: convert the file that we want to embed into an object file, then it can simply be linked in with the rest of our code.
Let's say we have a file name data.txt that we want to embed in our executable:
# cat data.txt Hello worldTo convert this into an object file that we can link with our program we just use objcopy to produce a ".o" file:
# objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 data.txt data.o
This tells objcopy that our input file is in the "binary" format,
that our output file should be in the "elf32-i386" format (object files on the x86).
The --binary-architecture option tells objcopy that the
output file is meant to "run" on an x86. This is needed so that ld
will accept the file for linking with other files for the x86.
One would think that specifying the output format as "elf32-i386" would imply this,
but it does not.
Now that we have an object file we only need to include it when we run the linker:
# gcc main.c data.oWhen we run the result we get the prayed for output:
# ./a.out Hello worldOf course, I haven't told the whole story yet, nor shown you main.c. When objcopy does the above conversion it adds some "linker" symbols to the converted object file:
_binary_data_txt_start _binary_data_txt_endAfter linking, these symbols specify the start and end of the embedded file. The symbol names are formed by prepending _binary_ and appending _start or _end to the file name. If the file name contains any characters that would be invalid in a symbol name they are converted to underscores (eg data.txt becomes data_txt). If you get unresolved names when linking using these symbols, do a hexdump -C on the object file and look at the end of the dump for the names that objcopy chose.
The code to actually use the embedded file should now be reasonably obvious:
#include <stdio.h>
extern char _binary_data_txt_start;
extern char _binary_data_txt_end;
main()
{
char* p = &_binary_data_txt_start;
while ( p != &_binary_data_txt_end ) putchar(*p++);
}
Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer
Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.
Subscribe now!
The Latest
Newsletter
Tech Tip Videos
- Nov-04-09
- Oct-29-09
- Oct-26-09
Recently Popular
From the Magazine
December 2009, #188
If last month's Infrastrucuture issue was too "big" for you then try on this month's Embedded issue. Find out how to use Player for programming mobile robots, build a humidity controller for your root cellar, find out how to reduce the boot time of your embedded system, and if you're new to embedded systems find out the basics that go into one. You can also read about the Beagle Board, the Mesh Potato and a spate of other interestingly named items. And along with our regular columns don't miss our new monthly column: Economy Size Geek.
Delicious
Digg
StumbleUpon
Reddit
Facebook








whoa the version number on
On June 18th, 2008 stabu (not verified) says:
whoa the version number on this article!
for 64bit x86's, use --output elf64-x86-64. The --binary-architecture option need not change, again somewhat unintuitively.
Its the program version number
On June 19th, 2008 Mitch Frazier says:
The version number is the version of the "hello world" program, not the article. And could somebody please come up with a new standard first program. If I see "hello world" in one more language I'm gonna spit-up :).
__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
so much stuff for little problem...
On June 16th, 2008 Anonymous (not verified) says:
man xxd for "xxd -i":
cat input_file | ( echo "unsigned char xxx[] = {"; xxd -i; echo "};" ) > output_file.c
There is another, portable way to do this
On June 13th, 2008 Hans Bezemer (not verified) says:
I was facing exactly the same problem when I wanted to embed 4tH bytecode into an executable. The trick is to convert the file into a C-file that can be compiled properly with any C compiler. 4tH features a program to do that. In essence it works like this: you read the file in binary mode byte by byte and convert those bytes to unsigned characters. A converted file looks like this:
static unit HelloWorld [] = { '\x01', '\x02', '\x04', '\x00', '\xff', '\xff', '\xff', '\x7f', '\x04', '\x5c', '\x03', '\x08', '\x02', '\x02', '\x02', '\x0d', '\x08', '\x08', '\x08', '\x05', '\x08', '\x02', '\x48', '\x65', '\x6c', '\x6c', '\x6f', '\x20', '\x77', '\x6f', '\x72', '\x6c', '\x64', '\x21', '\x00', '\xfd' };'unit' is equivalent to 'unsigned char'. You can even embed several files like this. IMHO this method is more transparent to both the programmer and the compiler. The source to do this is pretty trivial:
\ 4tH binary to .h file converter - Copyright 2007 J.L. Bezemer \ You can redistribute this file and/or modify it under \ the terms of the GNU General Public License \ This file is geared toward the conversion of 4tH HX bytecode. \ In order to convert other binary files, just change 'unit' to 'char'. s" static unit " sconstant header \ declaration header include lib/argopen.4th \ use ARG-OPEN word include lib/ulcase.4th \ case conversion 9 constant /line \ number of bytes per line char ' constant quote \ single quote character char , constant colon \ single colon character /line string line \ input buffer : .char ." '\x" <# # # #> s>lower type quote emit ; : .char, .char colon emit space ; ( n --) : ?c@ dup if 1- chars + c@ else 2drop 0 then ; : ?char if ?c@ .char else 2drop then ; ( a n f --) : .header header type 1 args type ." [] = {" cr ; : .footer ." };" cr ; ( --) : ?bounds space space over 0<> and if 1- then bounds ; : read over over accept tuck <> ; ( a n1 -- a n2 f) : .line >r 2dup r@ ?bounds ?do i c@ .char, loop r@ ?char cr r> ; : .lines hex begin line /line read .line until ; : Usage argn 4 < abort" Usage: bin2h variable file h-file" ; : OpenFiles Usage input 2 arg-open output 3 arg-open ; : Convert Openfiles .header .lines .footer close close ; ConvertHans Bezemer
Same Thing Using "Standard" Linux Commands
On June 13th, 2008 Mitch Frazier says:
As I allued to in my comment reply below about assembler output, you can create C (or assembler) data with standard Linux commands:
Using objcopy does this without the extra compilation step, although using the result is a bit more obscure. The other thing I like about using objcopy is that it doesn't leave a "temporary" ".c" file sitting around. Makes me nervous deleting ".c" files.
PS Try this, the hexdump command looks freaky but it actually does work!
__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
That is one of the most
On June 13th, 2008 Anonymous (not verified) says:
That is one of the most interesting things I have ever seen in this magazine. It's almost an introduction to how a linker works. It would be really excellent to expand upon this article, although I'm not expert enough to suggest in what way.
Thanks.
Use reswrap instead
On June 12th, 2008 Sander (not verified) says:
Or you just use a utility called reswrap which can convert any file into c/c++ data arrays. More portable and lot easier to use.
It's part of the fox toolkit. (www.fox-toolkit.org):
Usage: reswrap [options] [-o[a] outfile] files...
Convert files containing images, text, or binary data into C/C++ data arrays.
Options:
-o[a] outfile Output [append] to outfile instead of stdout
-h Print help
-v Print version number
-d Output as decimal
-m Read files with MS-DOS mode (default is binary)
-x Output as hex (default)
-t[a] Output as [ascii] text string
-e Generate external reference declaration
-i Build an include file
-k Keep extension, separated by underscore
-s Suppress header in output file
-p prefix Place prefix in front of names of declarations and definitions
-n namespace Place declarations and definitions inside given namespace
-c cols Change number of columns in output to cols
-u Force unsigned char even for text mode
-z Output size in declarations
Each file may be preceded by the following extra option:
-r name Override resource name of following resource file
How about assembler?
On June 12th, 2008 Mitch Frazier says:
echo ' .global data_txt' echo 'data_txt:' hexdump -v -e '" .byte " 16/1 " 0x%02x, " "\n"' data.txt | \ sed -e '$s/0x ,//g' -e 's/, *$//' echo ' .end'__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Ehhh...
On June 19th, 2008 mkourinny says:
__________________________.globl data_begin
.data
data_begin:
.incbin "data.txt"
.globl data_end
data_end:
Good luck to us,
Mikhail Kourinny
Hi mkourinny & Mitch
On June 19th, 2008 Anonymous (not verified) says:
Hi mkourinny & Mitch Frazier,
Both of ur scripts mentioned above for assembly
give the same output.
But I don't understand what does "Converting to
assembly mean". Sorry if it sounds silly. I guess
its converting an assembly file (.s) to hex bytes.
Thanks,
Ram
Not Quite
On June 19th, 2008 Mitch Frazier says:
Its converting a data file, of any type of data, into text that is valid assembly language. The resulting output could then be passed to the assembler and "assembled" (ie compiled by the assembler) into an object file.
Some of the other comments mention converting it to C and then compiling the C, this is the same idea only the target language is assembly language and not C.
The linux assembler is a program invoked with the command "as", it is sometimes referred to as "gas" for the GNU Assembler.
__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Thank u much. :) Sorry for
On June 19th, 2008 Ram (not verified) says:
Thank u much. :)
Sorry for posting many times.
It happened without my knowledge.
Ram
At Last an Assembly Language Programmer
On June 19th, 2008 Mitch Frazier says:
Didn't know that!
__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Post new comment