The write() system call is one of the fundamental building blocks for performing file I/O operations in C. It allows a program to write data from a buffer in memory to a file descriptor. In this comprehensive guide, we will dive deep into how to properly use write() in C programs.

Overview of the write() System Call

Here is the function prototype for write():

ssize_t write(int fd, const void *buf, size_t count);

It takes three arguments:

  • fd: The file descriptor to write to. This would be obtained by a previous open() system call.
  • buf: Pointer to the buffer containing the data to write.
  • count: The maximum number of bytes to write.

The write() function attempts to write up to count bytes from buf to the file referenced by the file descriptor fd.

On success, the number of bytes written is returned. This could be less than count if there was insufficient storage space.

On error, -1 is returned, and errno is set appropriately.

Opening a File for Writing

Before we can write to a file with write(), we need to open the file using open(). Here is an example:

#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>

int fd;
fd = open("file.txt", O_WRONLY | O_CREAT, 0644);
if (fd == -1) {
    // error
}

This opens "file.txt" for writing only (O_WRONLY), creating it if it doesn‘t exist (O_CREAT), with read/write permissions for the owner and read permissions for group and others (0644).

The open() call returns a file descriptor fd that can be used in subsequent read()/write() calls.

Some key points:

  • O_WRONLY opens the file write-only. Use O_RDWR to open for reading and writing.
  • Always check for errors from open() before proceeding.
  • The file will be truncated if it already exists. Use O_APPEND to append instead.

With the file open for writing, we can now look at calling write().

Writing Data to a File

The simplest way to use write() is to specify a buffer and length to write:

const char *msg = "Hello World";
ssize_t len = strlen(msg);

ssize_t bytes_written = write(fd, msg, len);

This will attempt to write the contents of msg ("Hello World") to the file descriptor fd.

The return value stored in bytes_written tells us how many bytes were successfully written.

We should always check for errors after calling write():

if (bytes_written == -1) {
    // error occurred
} else if (bytes_written != len) { 
   // couldn‘t write entire buffer
}
  • -1 means an error occurred, such as permissions issue or disk full
  • If bytes_written is less than len, it indicates there was insufficient storage space to write the entire buffer.

Otherwise, the write was successful and we wrote the exact number of bytes we intended to.

Writing Binary Data

For binary data, we simply replace the text buffer with pointers to structures:

struct data {
   int values[100];
};

struct data d;
// populate structure

ssize_t bytes_written = write(fd, &d, sizeof d); 

This writes the binary contents of d to file. No special formatting or handling is needed.

Appending Data to a File

To append rather than overwrite, open the file with O_APPEND:

fd = open("file.txt", O_WRONLY | O_APPEND); 

Any writes now go to the end of the file.

We can also enable/disable O_APPEND after opening using fcntl():

// enable append
fcntl(fd, F_SETFL, O_APPEND);   

// disable append 
fcntl(fd, F_CLRFL, O_APPEND);

This allows switching between normal writes and appending without closing/reopening the file.

Avoiding Interleaved Writes

With multiple threads or processes writing to a single file, write() calls can get interleaved and data corrupted.

For example, if process A writes "Hello " and process B then writes "World!", the file may end up containing "HelWorldlo!".

To avoid this race condition, we need to use some form of file locking.

POSIX provides both advisory and mandatory locking schemes to handle this issue.

Advisory Locking

Advisory locking is visible only to cooperating processes that examine lock status before accessing the file.

For example:

// set exclusive lock 
struct flock fl;
fl.l_type = F_WRLCK; 
fcntl(fd, F_SETLK, &fl);

// write data...

// clear lock
fl.l_type = F_UNLCK;
fcntl(fd, F_SETLK, &fl);

By using F_SETLK with an exclusive lock (F_WRLCK), we ensure only one process at a time can hold the advisory lock when writing to the fd.

If another process already holds the lock, our F_SETLK will fail rather than wait for the lock. We can use F_SETLKW to block if desired until the lock is available.

So advisory locking only works reliably among cooperative processes that check lock state before writing.

Mandatory Locking

With mandatory file/record locking, attempts to access a locked file region will always fail or block, regardless of whether processes check lock state explicitly.

To set up mandatory locking:

// enable mandatory locking on file 
posix_fallocate(fd, 0, filesize);  
fcntl(fd, F_SETFL, O_RDONLY);

Reads and writes to locked regions will now automatically fail rather than being interleaved.

Mandatory locking ensures data integrity without requiring processes to manually check lock state before reading/writing. The downside is decreased performance due to additional system calls to check and deny locked I/O requests.

So in summary:

  • Advisory locks – processes must cooperate and check locks before reading/writing
  • Mandatory locks – reads/writes automatically blocked if locked by another process

Efficient File Writing Using Buffering

System calls like write() require context switches from user mode to kernel mode. This can limit performance, especially for small writes.

Buffering writes can help by reducing total system calls:

#define BUF_SIZE 8192

char buf[BUF_SIZE];
int filled = 0; 

void write_buffer(int fd) {
   int bytes_written = write(fd, buf, filled);
   // error handling

   filled = 0; // reset buffer 
}

void add_to_buffer(const char *data, int len) {
    memcpy(buf + filled, data, len);
    filled += len;

    if (filled >= BUF_SIZE) {
       write_buffer(fd); 
    }
}

// then call add_to_buffer() to queue writes  

By buffering up writes and only calling write() periodically or when the buffer fills, we reduce the number of system calls required.

For small random writes, buffering like this can increase performance significantly.

Parallel Writes Using Async I/O

On multiprocessing systems, we can use asynchronous I/O to perform writes in parallel for maximum throughput.

The basic process is:

  1. Open the file descriptor in non-blocking mode
  2. Initiate async write using aio_write()
  3. Process can continue other work while write happens in background
  4. Check status with aio_return() or wait for signal when complete

For example:

// open in non-blocking mode
fcntl(fd, F_SETFL, O_NONBLOCK);  

struct aiocb cb;
cb.aio_fildes = fd;
cb.aio_buf = buffer; 
cb.aio_nbytes = len;
aio_write(&cb); // non-blocking

// do other work...

// wait for write to finish
while (aio_error(&cb) == EINPROGRESS) {
   // poll  
}

// get status
int ret = aio_return(&cb);

By using asynchronous I/O, we can queue multiple write operations in parallel. This allows maxing out disk I/O bandwidth especially on systems with multiple CPUs/cores.

The disadvantage of async I/O is added software complexity to manage multiple operations. So it mainly benefits high-performance servers doing heavy I/O.

Security Considerations

When writing files based on external or user-supplied input, be aware of the security risks, such as:

Directory Traversal

Attackers may try to manipulate paths to write files outside of the expected directories:

// vulnerable code 

filename = get_user_input();
fd = fopen(filename, "w"); // problem! 

fwrite(data, 1, len, fd);

By inputting paths like "../../../../etc/passwd", attackers can write files anywhere on the system.

To avoid directory traversal attacks:

  • Validate user paths to remove special chars like ..
  • Call realpath() on paths to resolve them before opening
  • Store files in dedicated directories not directly in /

Symbolic Links

A symbolic link is a special file that points to another file or directory:

ln -s /home/user/important_file symlink

If an attacker can create arbitrary symlinks, they could cause writes to overwrite critical files:

// vulnerable code

fd = fopen(user_input, "w"); 
fwrite(data, 1, len, fd);

If the user creates symlink pointing to /etc/passwd, writing the fd will modify that file!

To avoid issues with symlinks:

  • Follow symlinks on paths and validate targets before writing
  • Call fchflags(path, UF_NOFOLLOW) to disallow symlinks

So in summary, be vigilant about sanitizing external input used for filenames or paths passed to write(). Use available mechanisms like flags and access checks to prevent overwriting unexpected filesystem locations.

Conclusion

The write() system call is fundamental to writing data to files in C. With proper error checking, input validation, and concurrency control, it can be used securely and efficiently.

Key takeaways include:

  • Open files for writing using flags like O_WRONLY and O_APPEND
  • Always handle errors – check return value and update errno
  • Use locking where needed to prevent interleaved writes
  • Employ buffering and asynchronous I/O for better performance
  • Take care to sanitize all filesystem paths and filenames

By understanding these best practices for write(), you can build robust applications in C that store and manipulate data files with confidence. The techniques outlined here should provide a solid foundation for leveraging files in your programs.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *