Embedding a File in an Executable, aka Hello World, Version 5967
I recently had the need to embed a file in an executable. Since I'm working at the command line with gcc, et al and not with a fancy RAD tool that makes it all happen magically it wasn't immediately obvious to me how to make this happen. A bit of searching on the net found a hack to essentially cat it onto the end of the executable and then decipher where it was based on a bunch of information I didn't want to know about. Seemed like there ought to be a better way...
And there is, it's objcopy to the rescue. objcopy converts object files or executables from one format to another. One of the formats it understands is "binary", which is basicly any file that's not in one of the other formats that it understands. So you've probably envisioned the idea: convert the file that we want to embed into an object file, then it can simply be linked in with the rest of our code.
Let's say we have a file name data.txt that we want to embed in our executable:
# cat data.txt Hello worldTo convert this into an object file that we can link with our program we just use objcopy to produce a ".o" file:
# objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 data.txt data.o
This tells objcopy that our input file is in the "binary" format,
that our output file should be in the "elf32-i386" format (object files on the x86).
The --binary-architecture option tells objcopy that the
output file is meant to "run" on an x86. This is needed so that ld
will accept the file for linking with other files for the x86.
One would think that specifying the output format as "elf32-i386" would imply this,
but it does not.
Now that we have an object file we only need to include it when we run the linker:
# gcc main.c data.oWhen we run the result we get the prayed for output:
# ./a.out Hello worldOf course, I haven't told the whole story yet, nor shown you main.c. When objcopy does the above conversion it adds some "linker" symbols to the converted object file:
_binary_data_txt_start _binary_data_txt_endAfter linking, these symbols specify the start and end of the embedded file. The symbol names are formed by prepending _binary_ and appending _start or _end to the file name. If the file name contains any characters that would be invalid in a symbol name they are converted to underscores (eg data.txt becomes data_txt). If you get unresolved names when linking using these symbols, do a hexdump -C on the object file and look at the end of the dump for the names that objcopy chose.
The code to actually use the embedded file should now be reasonably obvious:
#include <stdio.h>
extern char _binary_data_txt_start;
extern char _binary_data_txt_end;
main()
{
char* p = &_binary_data_txt_start;
while ( p != &_binary_data_txt_end ) putchar(*p++);
}
Mitch Frazier is an Associate Editor for Linux Journal.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
6 hours 25 min ago - Nice article, thanks for the
17 hours 6 min ago - I once had a better way I
22 hours 52 min ago - Not only you I too assumed
23 hours 9 min ago - another very interesting
1 day 1 hour ago - Reply to comment | Linux Journal
1 day 2 hours ago - Reply to comment | Linux Journal
1 day 9 hours ago - Reply to comment | Linux Journal
1 day 10 hours ago - Favorite (and easily brute-forced) pw's
1 day 11 hours ago - Have you tried Boxen? It's a
1 day 17 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
C++ Linkage
NB: In order to compile with C++, declare the symbols as follows.
extern "C" {extern char binary_data_txt_start;
extern char binary_data_txt_end;
}
whoa the version number on
whoa the version number on this article!
for 64bit x86's, use --output elf64-x86-64. The --binary-architecture option need not change, again somewhat unintuitively.
Its the program version number
The version number is the version of the "hello world" program, not the article. And could somebody please come up with a new standard first program. If I see "hello world" in one more language I'm gonna spit-up :).
Mitch Frazier is an Associate Editor for Linux Journal.
so much stuff for little problem...
man xxd for "xxd -i":
cat input_file | ( echo "unsigned char xxx[] = {"; xxd -i; echo "};" ) > output_file.c
There is another, portable way to do this
I was facing exactly the same problem when I wanted to embed 4tH bytecode into an executable. The trick is to convert the file into a C-file that can be compiled properly with any C compiler. 4tH features a program to do that. In essence it works like this: you read the file in binary mode byte by byte and convert those bytes to unsigned characters. A converted file looks like this:
static unit HelloWorld [] = { '\x01', '\x02', '\x04', '\x00', '\xff', '\xff', '\xff', '\x7f', '\x04', '\x5c', '\x03', '\x08', '\x02', '\x02', '\x02', '\x0d', '\x08', '\x08', '\x08', '\x05', '\x08', '\x02', '\x48', '\x65', '\x6c', '\x6c', '\x6f', '\x20', '\x77', '\x6f', '\x72', '\x6c', '\x64', '\x21', '\x00', '\xfd' };'unit' is equivalent to 'unsigned char'. You can even embed several files like this. IMHO this method is more transparent to both the programmer and the compiler. The source to do this is pretty trivial:
\ 4tH binary to .h file converter - Copyright 2007 J.L. Bezemer \ You can redistribute this file and/or modify it under \ the terms of the GNU General Public License \ This file is geared toward the conversion of 4tH HX bytecode. \ In order to convert other binary files, just change 'unit' to 'char'. s" static unit " sconstant header \ declaration header include lib/argopen.4th \ use ARG-OPEN word include lib/ulcase.4th \ case conversion 9 constant /line \ number of bytes per line char ' constant quote \ single quote character char , constant colon \ single colon character /line string line \ input buffer : .char ." '\x" <# # # #> s>lower type quote emit ; : .char, .char colon emit space ; ( n --) : ?c@ dup if 1- chars + c@ else 2drop 0 then ; : ?char if ?c@ .char else 2drop then ; ( a n f --) : .header header type 1 args type ." [] = {" cr ; : .footer ." };" cr ; ( --) : ?bounds space space over 0<> and if 1- then bounds ; : read over over accept tuck <> ; ( a n1 -- a n2 f) : .line >r 2dup r@ ?bounds ?do i c@ .char, loop r@ ?char cr r> ; : .lines hex begin line /line read .line until ; : Usage argn 4 < abort" Usage: bin2h variable file h-file" ; : OpenFiles Usage input 2 arg-open output 3 arg-open ; : Convert Openfiles .header .lines .footer close close ; ConvertHans Bezemer
Same Thing Using "Standard" Linux Commands
As I allued to in my comment reply below about assembler output, you can create C (or assembler) data with standard Linux commands:
Using objcopy does this without the extra compilation step, although using the result is a bit more obscure. The other thing I like about using objcopy is that it doesn't leave a "temporary" ".c" file sitting around. Makes me nervous deleting ".c" files.
PS Try this, the hexdump command looks freaky but it actually does work!
Mitch Frazier is an Associate Editor for Linux Journal.
That is one of the most
That is one of the most interesting things I have ever seen in this magazine. It's almost an introduction to how a linker works. It would be really excellent to expand upon this article, although I'm not expert enough to suggest in what way.
Thanks.
Use reswrap instead
Or you just use a utility called reswrap which can convert any file into c/c++ data arrays. More portable and lot easier to use.
It's part of the fox toolkit. (www.fox-toolkit.org):
Usage: reswrap [options] [-o[a] outfile] files...
Convert files containing images, text, or binary data into C/C++ data arrays.
Options:
-o[a] outfile Output [append] to outfile instead of stdout
-h Print help
-v Print version number
-d Output as decimal
-m Read files with MS-DOS mode (default is binary)
-x Output as hex (default)
-t[a] Output as [ascii] text string
-e Generate external reference declaration
-i Build an include file
-k Keep extension, separated by underscore
-s Suppress header in output file
-p prefix Place prefix in front of names of declarations and definitions
-n namespace Place declarations and definitions inside given namespace
-c cols Change number of columns in output to cols
-u Force unsigned char even for text mode
-z Output size in declarations
Each file may be preceded by the following extra option:
-r name Override resource name of following resource file
How about assembler?
echo ' .global data_txt' echo 'data_txt:' hexdump -v -e '" .byte " 16/1 " 0x%02x, " "\n"' data.txt | \ sed -e '$s/0x ,//g' -e 's/, *$//' echo ' .end'Mitch Frazier is an Associate Editor for Linux Journal.
Ehhh...
.globl data_begin
.data
data_begin:
.incbin "data.txt"
.globl data_end
data_end:
Good luck to us,
Mikhail Kourinny
Macro version
(Thank you for the initial code that got me started.)
I turned the code into a macro, got rid of the global data_end and replaced it with data_len. You could go one big step forward and create a common header file containing the assembly and C macros. It could also contain a macro for C++. Then, just ifdef the macros based on the compiler flags. Then, you can just #include the same file, I think, in many places.
// Common Include File: test.h
#ifdef __ASSEMBLER__ .altmacro .macro binfile p q .globl \p&_begin \p&_begin: .incbin \q \p&_end: // Put a ".byte 0" here if you know your data is text // and you wish to use \p&_begin as a C string. It // doesn't hurt to leave it here even for binary data // since it is not counted in \p_&len .byte 0 .globl \p&_len \p&_len: .int (\p&_end - \p&_begin) .endm #else // Not __ASSEMBLER__ #ifdef __cplusplus extern "C" { #endif #define BIN_DATA(_NAME) \ extern char _NAME##_begin; \ extern int _NAME##_len #ifdef __cplusplus } #endif #endif// Assembly: test.S
// C or C++:
Hi mkourinny & Mitch
Hi mkourinny & Mitch Frazier,
Both of ur scripts mentioned above for assembly
give the same output.
But I don't understand what does "Converting to
assembly mean". Sorry if it sounds silly. I guess
its converting an assembly file (.s) to hex bytes.
Thanks,
Ram
Not Quite
Its converting a data file, of any type of data, into text that is valid assembly language. The resulting output could then be passed to the assembler and "assembled" (ie compiled by the assembler) into an object file.
Some of the other comments mention converting it to C and then compiling the C, this is the same idea only the target language is assembly language and not C.
The linux assembler is a program invoked with the command "as", it is sometimes referred to as "gas" for the GNU Assembler.
Mitch Frazier is an Associate Editor for Linux Journal.
Thank u much. :) Sorry for
Thank u much. :)
Sorry for posting many times.
It happened without my knowledge.
Ram
At Last an Assembly Language Programmer
Didn't know that!
Mitch Frazier is an Associate Editor for Linux Journal.