Porting DOS Applications to Linux

HOWTOs

by Alan Cox

on September 1, 1995

With a little care, the average DOS application can be easily ported to the Linux system. This article looks at some of the techniques involved, and tries to provide a small “builder's kit” of handy little DOS routines people always want under Linux.

Memory Models

DOS programs written in the C and C++ languages generally run in a variety of different memory models with their own segmentation semantics. The simplest is the “tiny” model, where all of the program and data are referenced off one segment. All three segment registers (CS, DS, and SS) point to the same place to suit the way the processor wishes to work. The Linux kernel executes programs in the 32-bit equivalent of tiny mode. Because offsets are 32-bit, not 16-bit, a program can utilise 4GB of address space before segmentation becomes an issue. Thus you get the simplicity of tiny model without the limitations.

As a result of this the DOS keywords near, far, and huge, have no meaning to Linux. These can be removed, or if you are trying to maintain a common source tree, you can add these lines instead:

#if defined(__linux__)
#define far
#define near
#define huge
#define register
#endif

gcc, the normal Linux C compiler, understands the register keyword, but the code optimiser is sufficiently good that using register is normally a bad idea.

Many DOS C compilers support an inline keyword. gcc also supports this.

C Types Supported

gcc supports all the ANSI C types you would expect and some extensions. The size of the normal types is, however, different from that of DOS compilers, and frequently causes problems when porting. Here is a summary for sizes on Linux/i386 (Linux on other architectures, such as the 64-bit Alpha, will differ in some respects):

Type Name       Linux           DOS time/small  DOS large       DOS huge
char            8 bits          8 bits          8 bits          8 bits
short           16 bits         16 bits         16 bits         16 bits
int             32 bits         16 bits         16 bits         16 bits
long            32 bits         32 bits         32 bits         32 bits
pointer         32 bits         16 bits         32 bits         32 bits
largest array   4GB*            64KB            64KB            640KB

* Actually, because some of the address space is reserved and used for other things, you can't get above about 2GB at the moment.

DOS programmers generally make good use of prototypes to avoid mysterious crashes caused by passing the wrong type. Mixing short and long under Linux normally just results in mysterious value changes in passed parameters, so the habit of prototyping is a good one to get into. Furthermore, you can tell gcc to warn you about any routine which has no prototype by adding the compiler flag -Wstrict-prototypes. All of the C library and system calls have prototypes, provided the correct header files are included.

gcc: the GNU C Compiler

The GNU C compiler is an extremely flexible tool. Although it compiles much slower than most of the DOS compilers, and is (intentionally) without an Integrated Development Environment, it has a wide range of abilities and flexibility that few DOS compilers can touch. People who have used DJGPP to write 32-bit DOS extender programs will be familiar with gcc, although in its Linux and Unix form it is somewhat easier to work with.

It is worth knowing how to tell gcc how to cope with different “flavours” of code. It can become a traditional K&R C compiler, by using the -traditional option, a strict ANSI compiler, by using the -ansi option, or a GNU C compiler—ANSI + GNU extensions. In addition, you can ask it to perform a wide range of sanity checks with the -pedantic and -Wall options. For a typical program, the compiler will generate a lot of warnings, many of which will give insights into potential problems. For example, the compiler will check to see that the conversion options in the format strings of printf()/scanf() and their family of functions match the types of the variables they will interpret.

The optimiser is controllable both by a general level of optimisations, using the -O1 or -O2 options, and on a per-optimisation basis for those speed-critical special cases. The optimiser performs a wide range of peephole and global optimisations, including intelligent allocation of registers, loop unrolling, and even instruction scheduling on RISC CPUs.

The GNU C compiler, linker, and debugger are all described in complete documents available from the Free Software Foundation, which you can either buy as books (the money goes to fund more free software work) or print yourself.

To cover the compiler, debugging tools, make, and other programs in full would require several more articles. If the documentation and documentation viewer are all installed, typing info gdb, info gcc, and info ld should give you a good start. (If the info program is not installed, the Emacs editor can also be used to read documentation in the info format.) Fans of graphical user interfaces may also like to pick up tgdb as a graphical front end for the gdb debugger, and xwpe, a look-alike of a well-known DOS C development environment, built on top of gcc, make, and gdb.

Where is the foo() Function?

This is commonly asked of a specific set of DOS C library and system functions, most notably the various text mode window packages, kbhit(), getch(), getche(), and the string functions stricmp() and strnicmp().

Not surprisingly, equivalent functionality exists under Linux. The text mode window case warrants a section of its own, so you will have to wait a minute or skip on ahead. The string functions are nice and easy. stricmp() is also known as strcasecmp() and strnicmp() as strncasecmp()---there is just a naming difference.

The keyboard I/O routines cause problems because Unix terminal I/O is a lot more flexible than DOS terminal I/O, and in the case of kbhit() it does not suit the multitasking nature of the system to have the CPU spend all its time in loops polling the keyboard. Furthermore, unlike in DOS, the terminal mode is set explicitly, rather implicitly by each call. A set of routines (the standard POSIX termios functions) exist to manipulate the control structures for each device.

You cannot use the various privileged instructions that might otherwise harm the machine integrity. Controlling I/O devices is done by devices in the kernel, with access through file abstraction (the “special” files in /dev/), rather than directly. It is possible (but dangerous) to use ioperm() to allow access to devices in a process running as root if absolutely necessary. Any code that does this will be non-portable, and thus this should be used for special purposes only. mmap() can be used for equivalent access to the device memory window on a PC (640KB-1MB), but this is equally as bad an idea for normal use. In particular, you should never, ever, attempt to do screen output this way.

Executing Other Programs

The Linux environment is built on the basis of small, effective, programs working together. Therefore, there are a wide variety of program execution facilities. They differ from the DOS environment in very specific ways. There are no equivalents to the various “swap out existing program and spawn another task” facilities found under DOS. The Linux virtual memory system will automatically decide by itself what to swap out and when. Secondly, the basic constructs of Linux process execution have no equivalents in DOS, even though there are library routines which do.

The simplest way to run another process is via the system() function, which calls a shell and feeds it a command string for interpretation and execution. All of the normal shell parsing and redirection is performed. This means that you should be careful of the arguments you pass if you don't want the shell to misinterpret any special characters.

This is a simple example of a program that shows who is logged on, and then who is logged on to the remote machine called “thrudd”.

void main(int argc, char *argv[])
{
  system("who");
  system("rsh thrudd who");
}

The Linux system makes heavy use of pipes—a way of feeding the output of one command into another. This is not just a shell facility. Any program can read or write from another using the popen() and pclose() calls. These work the same way as fopen() and fclose() do, save that popen() is passed a program as its argument. Because pclose() handles the termination of the process created by popen(), it is important to use the right close routine.

This brings us conveniently back to printing for our next example. Here is a set of subroutines for printing a file:

FILE *open_printer(char *printername)
{
  char buf[256];
  sprintf(buf,"lpr -P%s",printername);
  return popen(buf,"w");
}
void print_line(FILE *printer, char *line)
{
  fprintf(printer,"%s\n",line);
}
void close_printer(FILE *printer)
{
  pclose(printer);
}

Unlike DOS, Linux has no support for overlays in the linker, nor for loading overlays in the C libraries. The memory management and virtual memory system will swap unused program segments out of memory without assistance from the program being swapped and will automatically bring them back in when they are needed. Using overlays would not speed up program startup time, either. Program code is read from disk whenever the page of code in question is needed and not found resident in memory. The new ELF library format does support dynamic linking, and you could conceivably implement overlays using it. This is, however, pointless.

The core routines Linux provides for process execution are execve(), fork(), and wait(). The execve() call replaces the running program image with another one. The original is destroyed totally. The fork() call creates a new copy of the existing process. The only difference between the copies is the value returned by fork() (the process id of the new process in the parent, 0 in the child). Finally, the wait() call lets you wait for a process to complete. This level of control is unlikely to be needed when porting DOS programs, so they are not covered here.

Multitasking Politely

In general, the kernel automatically blocks programs, thus avoids having them use CPU time when they are waiting for I/O. A device can be opened with the extra option O_NDELAY to indicate that an error EWOULDBLOCK should be returned to indicate the lack of ready data (or for writing lack of buffer space). When a program is using I/O in this manner, it must take great care not to sit in a tight loop.

Avoid the following DOS-style constructs:

while(1)
{
  if(kbhit())
    do_something(getch());
  if(timer_expired())
    time_event();
}

Linux instead provides the very useful select() system call, which allows you to wait for multiple I/O events, a timeout, or both, in a manner that avoids polling and enables the kernel to avoid allocating processor resources to the task in question.

select() allows you to wait for a given time, or wait until one of a set of files has data ready to read or space to write, or wait until an exceptional condition occurs on that file. As the Linux system sees everything within reason as a file, this is extremely flexible.

Listing 1 shows is a “trivial” example. This is an implementation of kbhit(). For exact DOS behaviour, it assumes the terminal is already in raw mode, which we will discuss later. Otherwise, it will return 1 after ENTER is pressed, which is when data becomes available in the line-by-line cooked mode.

Sharp-eyed DOS programmers might wonder, “What happens with this kbhit() if we redirect the input of the program from a file? It's not a keyboard, but input is available.” The answer is simple enough—a disk file is always ready for reading, and at this level, there is no difference between reading a keyboard and reading a file. The program carries on and works fine. Indeed, you could redirect a program to run reading input from the mouse and select() would still behave consistently.

Files and Devices

File I/O under Linux is somewhat simpler than DOS. DOS emulates the Unix low level (open(), close(), read(), and write()) and high level “stdio” facilities, but DOS C libraries have their own ascii/binary awareness to handle the carriage return/line feed differences. Under Linux these are gone and there is no need to worry about specifying these (although the ascii/binary specification will be accepted). All of the DOS device names are different under Linux. Linux systems keep their devices in /dev. Here is a rough conversion chart:

    CON:         /dev/tty
    LPT1:        /dev/lp0
    LPT2:        /dev/lp1
    LPT3:        /dev/lp2
    COM1:        /dev/ttyS0 /dev/cua0
    COM2:        /dev/ttyS1 /dev/cua1
    COM3:        /dev/ttyS2 /dev/cua2
    COM4:        /dev/ttyS3 /dev/cua3
    NUL:         /dev/null

Note that it is normal to print by queueing jobs via the printing service (lpr) rather than writing directly to ports. On typical Linux systems the /dev/lp* files are protected so that a normal user cannot access them directly.

Terminal Input

Terminal I/O is distinctly different in Linux than it is under DOS. First, the POSIX terminal system is more modal than DOS. To switch from one-character-at-a-time mode (getch() in DOS) to a line-based editing mode requires an actual termios request, which gives the new terminal parameters to use. In addition, a program is responsible for restoring the terminal state before it runs other programs and when it exits. If you forget to do this, you may well need to switch screens and kill the process, or you may find that the shell gets confused by your terminal state and logs you out (which also fixes the problem).

Listing 2 includes some sample code for managing terminal I/O settings.

Terminal Output

On output, Unix programs traditionally avoid using direct cursor control codes and cannot write directly to video memory. The reasons for this are obvious. The terminal in question may be a different type of machine, in a different part of the world. Handling all the different terminal types by hand is unpleasant, so a library called curses is available. A more modern library called ncurses, which has such things as colour support, is also available for Linux. Older versions of this have had many bugs, but the latest appears very good indeed. See article “ncurses: Portable Screen Handling for Linux”, Linux Journal issue 17, September, 1995, for an introduction.

ncurses provides you with simple output control, colour (if the terminal supports it), function keys, and other manipulations in a terminal-independent manner. In addition, it optimises the updates it does to minimize traffic over slow networks or serial links. It is free and comes with a nice set of examples and good documentation. As it is an implementation of System V curses, you can pick up a book on curses from a library and use that as a reference or tutorial (as appropriate).

Should you decide to use ncurses to do your output, it will also provide all the routines necessary to do DOS style character-by-character input via the functions cbreak(), nocbreak(), echo(), and noecho(). The ncurses documentation explains all four.

Standard APIs for Mice

Until recently there was no standard control API for mice in text mode (in graphics mode X-Windows runs the mouse and provides all the facilities you would imagine a GUI to have). The normal mouse behaviour is to provide text mode cut and paste. This is managed by a program known as selection and more recently by gpm.

The gpm library allows your text mode application to handle mouse events both on the console and under X-Windows in xterm shell windows. Writing programs that use gpm to control the mouse is explained in article “Writing a Mouse-Sensitive Application” issue 17, September, 1995.

Terminate and Stay Resident Programs

Occasionally you will have to port a DOS terminate-and-stay-resident (TSR) program. If you have been accustomed to using undocumented DOS calls and switching stacks and other horrible assembly language goings-on you will be glad to know you can forget the experience.

First, the whole concept of terminate and stay resident is gone. When a program exits all its resources are freed, and the process ceases to exist. That does not mean the same facilities are not present; they are present in different ways that are more appropriate to a system which is already multitasking.

There are three main reasons for TSR programs under DOS.

To provide a library of subroutines for supporting some extended facility. Several loadable graphics libraries have used this facility. Under Linux you can create a new shared library and it will be available for linking with applications and sharing between multiple users.
To add a device driver. Device drivers are kernel code. Porting a DOS device driver will almost certainly be a major rewrite. Linux also has loadable device drivers via the modules support. Porting a DOS device driver is definitely beyond the scope of this article. In some cases the driver may be adding a high level facility that can be provided as a library or as an actual program left running all the time.
To create pop-up “hot key” based mini-applications like phone books. There is no reason for these under Linux. You have multiple console screens, the ability to have multiple screens on even a fairly dumb terminal with the iscreen program, and can run any application at any time. Thus, there is no need for mini-applications carefully patched into the kernel. You can just port it as a normal program.

For the second example, some TSR programs can be ported as if they were applications that provided services. The gpm mouse management is a fine example of this. It provides the core equivalents of the DOS mouse services interrupt facilities as an application program that runs in the background and a library of support routines which interface with the server.

Porting Graphical Applications

Graphical programs are much more complex to port, because the graphics hardware interface is not available. You can approach this two ways. First, svgalib provides the basic functions you need to port your application. Note that you cannot use the BIOS functions, because they are only available in 16bit mode. An svgalib application can be very fast (see linuxsdoom for a superb example), but cannot run remotely and is not easily ported beyond PC-based systems.

The second approach is to use X-Windows. This makes for a much harder port, as you will need to move to an event-based paradigm (akin to programming for MS Windows) and rewrite your interface dialog and menus into X widgets. Furthermore X-Windows programming is—initially at least—quite hard to get the hang of. The result, however, is a graphical program that is portable and can run remotely. X-Windows is generally slower than raw SVGA, as one might expect. There is, however, an extension (called Xshm) that Linux and most Unix systems include, which supports fast bitmap updating as occurs commonly in games.

In the “cheat box” there is also Tcl/Tk, a front end language for writing easy X-Windows interfaces. Its applicability to a given program is very hard to summarise. However, applications that are basically modal can generally make best use of Tcl/Tk. Using Tcl/Tk to write front ends for programs is covered in “Using Tcl and Tk from Your C Programs”, Linux Journal issue 10, February, 1995.

Alan Cox has been working on Linux since version 0.95, when he installed it to do further work on the AberMUD game. He now manages the Linux Networking, SMP, and Linux/8086 projects and hasn't done any work on AberMUD since November 1993. In real life he hacks ISDN routers for I2IT.

Resources

A commercial clone of Borland's BGI library is also available for Linux. If your program uses BGI graphics, this may be an attractive option. A shareware version (US$15 registration) should be available at sunsite.unc.edu in the file /pub/Linux/apps/graphics/bgi_library.tar.gz by the time you read this.

The sunsite.unc.edu ftp site also carries a wide variety of database tools, from simple libraries for handling reading/writing PC-style XBase files to SQL systems.

A Commercial package called FlagShip for porting clipper database programs directly to Linux is available. A demo version is available from ftp://ftp.wgs.com/pub2/wgs/Filelist

Load Disqus comments