Michael W. Bigrigg, Copyright
2004-2007
The I/O programming in this chapter is for the C and C++ programming languages, based on the standard I/O that is part of the C programming language. Standard I/O is used to interact with the screen and keyboard, the file system, and the network.
All of the standard I/O routines and definitions can be found in the stdio.h header file. Our first program is always a hello world program, shown in Figure 1. It is a simple program that displays a greeting on the screen using the printf command.
main() {
printf ("hello world\n");
}
What escapes most people is that the printf command, like all I/O, may report an error condition. In the case of printf, it is the return value that will identify any errors associated with the process of printing to the screen.
The return value will specify the number of characters transmitted. It will return a negative value upon an error. What most programmers will say is that if printing to the screen fails, then how can we let the user know that there is an error, after all don't we typically print our error message to the screen? Remember in C programs, the main function returns a value to the operating system upon completion. In our first hello world program we were very casual about writing our program. A more appropriate hello world program contains the return value that is given to the operating system to identify if the program itself has completed successfully.
The return value of our program is opposite of many of the return values for C functions in that a zero signifies a successful program, with any positive number used to signify an error, as shown in Figure 2.
int main() {
int n;
n = printf ("hello world\n");
if (n != 12)
return 1;
else
return 0;
}
In the end it is possible to convey an error to the user even if printing to the screen is not possible. Most programmers believe that printing to the screen is a given and do not actually check the return value of printf. The problem is pervasive in that most I/O functions are not checked to see if they return an error.
The partner to printf is scanf. The scanf function is used to read input from the default user input device, the keyboard. The return value to scanf is the number of input items matched, or an EOF upon an error. This is shown in Figure 3.
int main()
{
int a;
char b;
int n;
n = scanf ("%i %c\n",&a, &b);
if (n == EOF)
return 1;
else
return 0;
}
The EOF marker is a value define in stdio.h. On many platforms, the value is a -1, though it is not always guaranteed to be. In the above example, n may be 2, 1, or even 0 based upon the input. It is based on how many items match, so a zero is not an error condition but signifies that no items matched the input.
The printf and scanf routines do not identify where to do the I/O. By default it is the keyboard and the screen. If we want to use another device, we would have to explicitly access it. First, we open the device. The most common device accessed is the file system. We use fopen to open a file and get a file handle to that opened file. The arguments to fopen are the name of the file and also the access. We can open a file for reading, writing, or appending. The access is then either "r", "w", or "a". There are other access modifiers as on some systems you have to explicitly identify accessing a binary file.
The fopen routine will return a NULL upon a failure, otherwise it will return a positive value that signifies the file handle associated with the file that has just been opened. Most programmers just assume that a NULL is a zero and will test for a zero instead of a NULL. A NULL is not guaranteed to be a zero.
f = fopen("myfile.txt","r");
if (f == NULL)
return 1;
else
return 0;
}
The file handle returned by fopen is used for subsequent calls to read or write to/from the file. We can test the file handle to see if it is at the end of the file using the feof routine. The feof routine is unique as it does not have an ability to return a failure exactly. The routine will return a nonzero value if and only if the file handle location is at the end of the file. A zero signifies both a not end of file and also an error.
The fprintf and fscanf routines are similar to their non f- counterparts. The fprintf routine returns a negative number on error just like printf. The fscanf routine returns an EOF on error just like scanf. The error values for fread and fwrite are very different. They report errors when we return a value less than the number of bytes in the request.
The most commonly ignored error value after printf is for the fclose routine. The value returned on error for an fclose is an EOF. Most programmers test for a NULL instead.
Table 1 lists the standard I/O routines along with the values used to signify an error.
Network programming is not a standard part of the C programming language libraries. Network programming using sockets, often called client-server programming, are similar to terminal and file I/O. A socket server will create a socket and then wait to accept connections on the socket from clients, shown in Figure 5. Communication to the client will be done through the remoteSocket.
/* create a TCP/IP stream socket */
localSocket = socket (AF_INET, SOCK_STREAM, 0);
/* assign a name to a socket (bind) */
local.sin_family = AF_INET;
local.sin_addr.s_addr = htonl(INADDR_ANY);
local.sin_port = htons(port_number);
bind (localSocket,
(struct sockaddr_in *)&local,
sizeof(local));
/* wait for clients to connect */
listen (localSocket, 5);
remoteSocket = accept (localSocket,
(struct sockaddr *)&remoteSocket,
sizeof(remoteSocket));
A socket client will create a socket and then connect that socket to a remote server socket, shown in Figure 6.
/* create a TCP/IP stream socket */
remoteSocket = socket (AF_INET, SOCK_STREAM, 0);
/* connect to remote server */
remote.sin_family = AF_INET;
hbn = gethostbyname(port_host);
bcopy(hbn->h_addr,
&remote.sin_addr, hbn->h_length);
remote.sin_port = htons(port_number);
connect (remoteSocket,
(struct sockaddr_in *)&remote,
sizeof(remote));
Once the client and server are connected, they may alternate in sending and receiving data. The example code is in Figure 7.
Our first generation of exception injection tools was originally called FlakyIO. The name was changed to FlakyC as we began to develop other I/O-based exception injection systems. It consists of a fault exception engine statically linked in with the application, altered by a C source-code modification tool. Exceptions were raised based on function call. We reported the results of the running of each of the applications emulating transient exceptions based on the first call to the function. The applications handled the exception correctly (HC), handled it incorrectly (HI), or failed silently (S). Most applications were also written to report their errors via a printf type call without checking to see if the error was reported properly (S2).
We applied the testing harness to a specific class of applications that are I/O based and use the C standard I/O library. The FlakyC tool was first applied to the GNU binutils and the GNU textutils. We tested some 26 text utility and 15 bin utility applications written by several authors. They are all written in C and rely heavily on the C standard I/O library. They have been ported to numerous platforms and are a widely accepted and used set of utilities. Our experiments were performed using the Linux platform. The source files for the utilities as well as the supplemental FlakyC libraries were run through the C preprocessor, test harness source translator, and then compiled. We first ran the applications in their normal operating mode and then in our test mode.
The original observations were that some applications do and some applications do not do proper error testing. For instance, cksum is reported to silently fail its fread() call. The cksum program checks to see if the result of the fread() call is less than or equal to zero. A zero signifies the end of file, while a negative value signifies an er-ror. We have marked this program as not handling the exception. This is only partially correct, as cksum will also make an additional call after the fread() to feof() to determine if the file is at the end of the file. While cksum will catch a permanent failure of the disk, it will not be able to handle transient errors.
Csplit is a utility that splits a file into two separate files. When fwrite is failed in csplit, two files are created and the csplit even displays what the correct sizes should be, but the files themselves are empty. No error is given and the application exits normally.
If an fclose fails, an error is almost always given. However, the program does not always exit gracefully nor does it give clear information about what error occurred or where it occurred. For example, in the join utility when fclose() fails the error given is: "./join_flaky: k: ,≤ÿ¿,≤ÿ¿ ".
The next thing we did was to apply Flaky to a mySQL database. We used the same configuration as we did for the GNU textutils: exception engine on the application side with a function call boundary, and we raised a transient exception on the first call to a function. The results were initially not what we had expected. Instead of a mixed bag of results similar to the textutils, we found that no matter what the test, the result was the same "good bye". The C I/O routines had been wrapped to exit based on error conditions.
We have grown to like the "good bye" message. It is a simple and graceful exit of the applica-tion. It is an acceptable handling of the exception. We did not examine the code to determine if the database was left in a correct state.
After analyzing all of the utilities for how they handle different error situations, four different failure categories became apparent. These four are: handles correctly (HC), handles incorrectly (HI), silent failure (S), and silent failure (S2) when failed in conjunction with another function.
Handles correctly (HC) is when an application rec-ognizes that an error has occurred, an error return value has been acknowledged by the calling function and an appropriate error display is given. This is correct behavior.
Handles incorrectly (HI) is similar to handles cor-rectly in as much as the application acknowledges that an error has occurred and an error return value has been received, but an accurate error display is not given.
Silent failure (S) occurs when an application does not crash or acknowledge any error, even though an error value was returned by the function. The application con-tinues its execution as if no error occurred.
The final error category is also a silent failure (S2), but the function in question was used to produce an er-ror message because another function has failed. The primary error is being acknowledged, but the function used to produce the error message itself is not checked for errors.
The primary observations are that the most common case is silent failures. The silent failures do not always produce no output, but very often corrupted data. Additionally, output is less likely to be checked for an error condition than input.
In a silent failure, a function call would fail to read to or write from a file and either report nothing at all or report false data. For example when failing fread in the cksum utility, the output is 0 for the number of bytes in the file and an incorrect checksum is produced as well. No error was reported even though the correct error condition was returned.
We first developed a testing harness to allow us to selectively manipulate the behavior of the function calls in the application source code. The test harness modi-fies the application at the source code level. It consists of a source-level translator, a run-time library, and a control driver.
The source-level translator is the front end of a C compiler. The C preprocessor is first run on all source code before being passed to the translator as shown in Figure 1. The translator builds a parse tree to be used to modify the source code and add in conditional state-ments to give control over specific function calls. For example a normal fread function call may look like this:
cc = fread(buf->buf+buf->used, 1,
buf->alloc-1-buf->used,fp));
At the point of the function call a conditional is inserted around the function call to give control to the driver over the execution of the fread function. This is shown below:
cc = (flaky_fread ( 14262 )
? flakyerr_fread ()
: fread(buf->buf+buf->used, 1,
buf->alloc-1-buf->used,fp));
The flaky_<function>(<line number>) implements a call to the driver to determine whether to execute or fail the function. It takes in the line number where the function is found in the code to help in analysis of the results when several function calls are being made. The control driver can fail functions on a specific instance of a call or for all call instances. For example, fread can be failed at a given point in a program by returning the error code rather than executing the function. A behavior control input file allows for functions to be specified when to fail: always fail, fail consecutively, or fail on a certain line number.
A log is generated that tracks the progress of the application and tracks the calling pattern of the application.
The flakyerr_<function> returns the appropriate error value based on the specified function. For example, the C standard I/O library fread call will return up to but no more than the number of bytes that have been requested. A return value of 0 does not signify an error condition, just that no data is currently available. A return value of -1 signifies an error and is what will be returned by flakyerr_fread.
We will walk you through an example of using FlakyC using the code in Figure 9.
main()
{
int i;
i = fopen("foo.txt", "r");
fclose(i);
}
In this program, we specifically do not handle the possible error conditions that could result from the fopen or fclose calls. The FlakyIO system will allow us to inject a fault in the fopen or fclose call emulating, for instance, a network fault in which the distributed file server is unavailable.
As with any C compiler system, all preprocessor directives (e.g. #if) are handled by the preprocessor. If you have preprocessor directives you should first run it through the preprocessor, typically cpp.
An example line using the GNU C/C++ compiler, is to use the -E directive which specifies to the compiler driver to run the preprocessing phase (run cpp) and then stop. The output is then a preprocessed version of the original source file.
It is not necessary to preprocess our sample program as we do not use any preprocessor directives. Remember #include is a preprocessor directive.
If we did preprocessing on the example file, the only change would be the addition of a #line directive inserted to guide the compiler when producing error messages for syntax errors in the code.
{
int i;
i = fopen("foo.txt","r");
fclose(i);
}
2. Install the I/O Fault Injection Wrappers
The flake program will modify your source program to wrap your I/O function calls for selective failing. These wrappers interface to our fault injection driver. The flake program was built under SuSE Linux 7.0 and should work under most Linux implementations. If you have a problem getting it to run, please let us know. There are no arguments and simply accepts your program on the standard input and produces a modified program on the standard output.
./flake < example.i > example2.c
You could have submitted example.c as input if you did not do have to do any preprocessing. We redirect the output to the file that contained the modified program. The values submitted to the flaky calls are the line numbers.
main ()
{
int i ;
i =flaky_fopen(6)?flakyerr_fopen()
:fopen ("foo.txt", "r");
flaky_fclose(7)?flakyerr_fclose()
:fclose (i );
}
For our implementation, the error reporting is currently very crude. If the submitted program does not correctly pass the ANSI C lexical and syntax analysis a message is reported. Simply submit the program through an actual compiler to understand what is wrong with your program.
ERROR: File to be processed must be error free.
3. Link the Fault Injection Engine into the Application
The fault injection engine is distributed as an object file which you link into your modified application code. The engine controls the application and inserts faults as directed.
gcc -o example example2.c flaky_utils.o
This will link in the flakyerr_fopen and flakyerr_fclose routines. These routines are used to return an error value back to the calling routine. If we choose not to fail the routine, we do not even make any call to the actual function. The reason we can do that is that I/O is an all-or-nothing activity.
FILE * flakyerr_fopen ()
{
return NULL;
}
int flakyerr_fclose ()
{
return EOF;
}
We also link in the flaky_open and flaky_close routines.
int flaky_fopen (int a)
{
initiate_log ();
fprintf (run_time_log, "fopen() called");
fprintf (run_time_log, " Line: %d ", a);
if (flaky_control (1, 15))
return 1;
else
return 0;
}
int flaky_fclose (int a)
{
initiate_log ();
fprintf (run_time_log, "fclose() called");
fprintf (run_time_log, " Line: %d ", a);
if (flaky_control (1, 13))
return 1;
else
return 0;
}
The initiate_log() routine is used to open the log file, the run_time_log file handle. This provides a record of what routines were called and which ones were failed. We have used this as an initial way to understand what calls are being made through the execution of the program and as a way to flag which calls should be failed.
4. Identify the Fault Behavior
The behavior control file is read by the fault injection engine as a specification for how faults should be injected into the application. Currently, it is possible to fail by function call specifying that the function call should always be failed, or which instance of the call should be failed. Here is a part of the sample control file.
printf 0 2
fopen 1 0
fclose 0 0
The file is organized as follows. The first field is the name of the control function. The example above shows control for printf, fopen, and fclose. The second field identifies either 1=always fail this routine or 0=do not always fail this routine. The last field identifies which, if any, specific instance should be failed. If the second field is a 0, the third identifies the single instance that should be failed. If the second field contains a 1, the third identifies which instance the function should be failed starting.
Given our above example, the fault injection engine will never specifically fail the fclose call, will fail every instance of fopen, and will fail the second and only the second call to printf. Of course since our sample program does not make a call to printf, that line has no effect.
In order to make the most effective use of FlakyC, you should run your application in a failure-free environment. This will allow the only faults in your application to be inserted by FlakyC. Using our sample test program and our behavior which fails the fopen, the application will crash when the fclose attempts to close a file handle of null which signifies an error from fopen. You can then also examine the log file which shows the I/O calls made and which ones were failed and the error that was delivered to the calling application.
fopen() called Line: 5 Always fail
fclose() called Line: 6 Never fail
The heart of the FlakyC environment is the source code conversion tool.
An alternative to using a source code transformation tool is to use an aspect oriented programming weaver. A weaver will modify source based on a pattern. Just as placing logging information around certain calls, the same thing can be done with our selective exeception routine.