Embedding a File in an Executable, aka Hello World, Version 5967
I recently had the need to embed a file in an executable. Since I'm working at the command line with gcc, et al and not with a fancy RAD tool that makes it all happen magically it wasn't immediately obvious to me how to make this happen. A bit of searching on the net found a hack to essentially cat it onto the end of the executable and then decipher where it was based on a bunch of information I didn't want to know about. Seemed like there ought to be a better way...
And there is, it's objcopy to the rescue. objcopy converts object files or executables from one format to another. One of the formats it understands is "binary", which is basicly any file that's not in one of the other formats that it understands. So you've probably envisioned the idea: convert the file that we want to embed into an object file, then it can simply be linked in with the rest of our code.
Let's say we have a file name data.txt that we want to embed in our executable:
# cat data.txt Hello worldTo convert this into an object file that we can link with our program we just use objcopy to produce a ".o" file:
# objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 data.txt data.o
This tells objcopy that our input file is in the "binary" format,
that our output file should be in the "elf32-i386" format (object files on the x86).
The --binary-architecture option tells objcopy that the
output file is meant to "run" on an x86. This is needed so that ld
will accept the file for linking with other files for the x86.
One would think that specifying the output format as "elf32-i386" would imply this,
but it does not.
Now that we have an object file we only need to include it when we run the linker:
# gcc main.c data.oWhen we run the result we get the prayed for output:
# ./a.out Hello worldOf course, I haven't told the whole story yet, nor shown you main.c. When objcopy does the above conversion it adds some "linker" symbols to the converted object file:
_binary_data_txt_start _binary_data_txt_endAfter linking, these symbols specify the start and end of the embedded file. The symbol names are formed by prepending _binary_ and appending _start or _end to the file name. If the file name contains any characters that would be invalid in a symbol name they are converted to underscores (eg data.txt becomes data_txt). If you get unresolved names when linking using these symbols, do a hexdump -C on the object file and look at the end of the dump for the names that objcopy chose.
The code to actually use the embedded file should now be reasonably obvious:
#include <stdio.h>
extern char _binary_data_txt_start;
extern char _binary_data_txt_end;
main()
{
char* p = &_binary_data_txt_start;
while ( p != &_binary_data_txt_end ) putchar(*p++);
}
Mitch Frazier is an Associate Editor for Linux Journal.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Dart: a New Web Programming Experience
- Developer Poll
- May 2013 Issue of Linux Journal: Raspberry Pi
- Trying to Tame the Tablet
- Google Docs
7 min 44 sec ago - Reply to comment | Linux Journal
4 hours 56 min ago - Reply to comment | Linux Journal
5 hours 42 min ago - Web Hosting IQ
7 hours 16 min ago - Thanks for taking the time to
8 hours 53 min ago - Linux is good
10 hours 51 min ago - Reply to comment | Linux Journal
11 hours 8 min ago - Web Hosting IQ
11 hours 38 min ago - Web Hosting IQ
11 hours 38 min ago - Web Hosting IQ
11 hours 39 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.



Comments
C++ Linkage
NB: In order to compile with C++, declare the symbols as follows.
extern "C" {extern char binary_data_txt_start;
extern char binary_data_txt_end;
}
whoa the version number on
whoa the version number on this article!
for 64bit x86's, use --output elf64-x86-64. The --binary-architecture option need not change, again somewhat unintuitively.
Its the program version number
The version number is the version of the "hello world" program, not the article. And could somebody please come up with a new standard first program. If I see "hello world" in one more language I'm gonna spit-up :).
Mitch Frazier is an Associate Editor for Linux Journal.
so much stuff for little problem...
man xxd for "xxd -i":
cat input_file | ( echo "unsigned char xxx[] = {"; xxd -i; echo "};" ) > output_file.c
There is another, portable way to do this
I was facing exactly the same problem when I wanted to embed 4tH bytecode into an executable. The trick is to convert the file into a C-file that can be compiled properly with any C compiler. 4tH features a program to do that. In essence it works like this: you read the file in binary mode byte by byte and convert those bytes to unsigned characters. A converted file looks like this:
static unit HelloWorld [] = { '\x01', '\x02', '\x04', '\x00', '\xff', '\xff', '\xff', '\x7f', '\x04', '\x5c', '\x03', '\x08', '\x02', '\x02', '\x02', '\x0d', '\x08', '\x08', '\x08', '\x05', '\x08', '\x02', '\x48', '\x65', '\x6c', '\x6c', '\x6f', '\x20', '\x77', '\x6f', '\x72', '\x6c', '\x64', '\x21', '\x00', '\xfd' };'unit' is equivalent to 'unsigned char'. You can even embed several files like this. IMHO this method is more transparent to both the programmer and the compiler. The source to do this is pretty trivial:
\ 4tH binary to .h file converter - Copyright 2007 J.L. Bezemer \ You can redistribute this file and/or modify it under \ the terms of the GNU General Public License \ This file is geared toward the conversion of 4tH HX bytecode. \ In order to convert other binary files, just change 'unit' to 'char'. s" static unit " sconstant header \ declaration header include lib/argopen.4th \ use ARG-OPEN word include lib/ulcase.4th \ case conversion 9 constant /line \ number of bytes per line char ' constant quote \ single quote character char , constant colon \ single colon character /line string line \ input buffer : .char ." '\x" <# # # #> s>lower type quote emit ; : .char, .char colon emit space ; ( n --) : ?c@ dup if 1- chars + c@ else 2drop 0 then ; : ?char if ?c@ .char else 2drop then ; ( a n f --) : .header header type 1 args type ." [] = {" cr ; : .footer ." };" cr ; ( --) : ?bounds space space over 0<> and if 1- then bounds ; : read over over accept tuck <> ; ( a n1 -- a n2 f) : .line >r 2dup r@ ?bounds ?do i c@ .char, loop r@ ?char cr r> ; : .lines hex begin line /line read .line until ; : Usage argn 4 < abort" Usage: bin2h variable file h-file" ; : OpenFiles Usage input 2 arg-open output 3 arg-open ; : Convert Openfiles .header .lines .footer close close ; ConvertHans Bezemer
Same Thing Using "Standard" Linux Commands
As I allued to in my comment reply below about assembler output, you can create C (or assembler) data with standard Linux commands:
Using objcopy does this without the extra compilation step, although using the result is a bit more obscure. The other thing I like about using objcopy is that it doesn't leave a "temporary" ".c" file sitting around. Makes me nervous deleting ".c" files.
PS Try this, the hexdump command looks freaky but it actually does work!
Mitch Frazier is an Associate Editor for Linux Journal.
That is one of the most
That is one of the most interesting things I have ever seen in this magazine. It's almost an introduction to how a linker works. It would be really excellent to expand upon this article, although I'm not expert enough to suggest in what way.
Thanks.
Use reswrap instead
Or you just use a utility called reswrap which can convert any file into c/c++ data arrays. More portable and lot easier to use.
It's part of the fox toolkit. (www.fox-toolkit.org):
Usage: reswrap [options] [-o[a] outfile] files...
Convert files containing images, text, or binary data into C/C++ data arrays.
Options:
-o[a] outfile Output [append] to outfile instead of stdout
-h Print help
-v Print version number
-d Output as decimal
-m Read files with MS-DOS mode (default is binary)
-x Output as hex (default)
-t[a] Output as [ascii] text string
-e Generate external reference declaration
-i Build an include file
-k Keep extension, separated by underscore
-s Suppress header in output file
-p prefix Place prefix in front of names of declarations and definitions
-n namespace Place declarations and definitions inside given namespace
-c cols Change number of columns in output to cols
-u Force unsigned char even for text mode
-z Output size in declarations
Each file may be preceded by the following extra option:
-r name Override resource name of following resource file
How about assembler?
echo ' .global data_txt' echo 'data_txt:' hexdump -v -e '" .byte " 16/1 " 0x%02x, " "\n"' data.txt | \ sed -e '$s/0x ,//g' -e 's/, *$//' echo ' .end'Mitch Frazier is an Associate Editor for Linux Journal.
Ehhh...
.globl data_begin
.data
data_begin:
.incbin "data.txt"
.globl data_end
data_end:
Good luck to us,
Mikhail Kourinny
Macro version
(Thank you for the initial code that got me started.)
I turned the code into a macro, got rid of the global data_end and replaced it with data_len. You could go one big step forward and create a common header file containing the assembly and C macros. It could also contain a macro for C++. Then, just ifdef the macros based on the compiler flags. Then, you can just #include the same file, I think, in many places.
// Common Include File: test.h
#ifdef __ASSEMBLER__ .altmacro .macro binfile p q .globl \p&_begin \p&_begin: .incbin \q \p&_end: // Put a ".byte 0" here if you know your data is text // and you wish to use \p&_begin as a C string. It // doesn't hurt to leave it here even for binary data // since it is not counted in \p_&len .byte 0 .globl \p&_len \p&_len: .int (\p&_end - \p&_begin) .endm #else // Not __ASSEMBLER__ #ifdef __cplusplus extern "C" { #endif #define BIN_DATA(_NAME) \ extern char _NAME##_begin; \ extern int _NAME##_len #ifdef __cplusplus } #endif #endif// Assembly: test.S
// C or C++:
Hi mkourinny & Mitch
Hi mkourinny & Mitch Frazier,
Both of ur scripts mentioned above for assembly
give the same output.
But I don't understand what does "Converting to
assembly mean". Sorry if it sounds silly. I guess
its converting an assembly file (.s) to hex bytes.
Thanks,
Ram
Not Quite
Its converting a data file, of any type of data, into text that is valid assembly language. The resulting output could then be passed to the assembler and "assembled" (ie compiled by the assembler) into an object file.
Some of the other comments mention converting it to C and then compiling the C, this is the same idea only the target language is assembly language and not C.
The linux assembler is a program invoked with the command "as", it is sometimes referred to as "gas" for the GNU Assembler.
Mitch Frazier is an Associate Editor for Linux Journal.
Thank u much. :) Sorry for
Thank u much. :)
Sorry for posting many times.
It happened without my knowledge.
Ram
At Last an Assembly Language Programmer
Didn't know that!
Mitch Frazier is an Associate Editor for Linux Journal.