File and Directory Management

Files and Their Metadata

Files in a Unix system are referenced by an inode, which acts as a conceptual entity and physical object storing file metadata. Inodes are addressed by filesystem-unique numerical values known as inode numbers. The inode stores access permissions, timestamps, owner, group, and the location of the file’s data, but explicitly does not store the filename itself.

  • The stat Family: A suite of system calls used to retrieve a file’s inode metadata.
    • stat(path, buf): Retrieves metadata for the file resolved by the given path.
    • fstat(fd, buf): Retrieves metadata for an open file descriptor.
    • lstat(path, buf): Retrieves metadata for a symbolic link itself, bypassing the link’s target.
  • The stat Structure: Populated by the stat calls, containing the following core fields:
    • st_dev: Device node ID where the file resides (evaluates to 0 for virtual/network filesystems).
    • st_ino: The file’s filesystem-unique inode number.
    • st_mode: File type and access permissions.
    • st_nlink: Number of hard links pointing to this inode.
    • st_uid / st_gid: The User ID and Group ID of the file’s owner.
    • st_rdev: The device ID, utilized only if the file is a special device node.
    • st_size: Total file size in bytes.
    • st_blksize: Optimal block size for user-buffered I/O.
    • st_blocks: Number of physical blocks allocated (often smaller than expected if the file contains sparse holes).
    • st_atime, st_mtime, st_ctime: Timestamps for last access, last modification (data write), and last status change (metadata alteration), respectively.

Manipulating a file’s metadata requires specific system calls to alter the permission and ownership attributes defined within this stat structure.

Permissions and Ownership

File access is governed by permissions and ownership rules evaluated against the process’s effective User ID (UID) and Group ID (GID).

  • Permissions:
    • chmod(path, mode) and fchmod(fd, mode) modify file permission bits.
    • The mode parameter accepts binary-ORed POSIX constants (e.g., S_IRUSR | S_IWUSR).
    • Modifying permissions requires the calling process’s effective UID to match the file’s owner, or the process must possess the CAP_FOWNER capability.
  • Ownership:
    • chown(path, owner, group) changes ownership, automatically following symbolic links to alter the target.
    • lchown(path, owner, group) alters the ownership of a symbolic link directly.
    • fchown(fd, owner, group) alters ownership via an open file descriptor.
    • Passing -1 as the owner or group argument leaves that specific attribute unchanged.
    • Modifying the file owner requires the CAP_CHOWN capability, typically restricted to the root user.

Beyond standard permissions and ownership mapped in the inode, filesystems support arbitrary, user-defined metadata via extended attributes.

Extended Attributes (xattrs)

Extended attributes associate arbitrary key/value pairs with files, enabling feature additions like Access Control Lists (ACLs) or mandatory access controls without restructuring the underlying filesystem.

  • Keys and Values:
    • Keys are valid UTF-8 strings formatted as namespace.attribute (e.g., user.mime_type).
    • Values are arbitrary byte arrays. They are not guaranteed to be null-terminated strings, requiring explicit size tracking during I/O operations.
  • Namespaces: The prefix of the key dictates access policy.
    • system: Implements kernel features like ACLs; highly restricted access.
    • security: Implements security modules like SELinux; writable only by CAP_SYS_ADMIN.
    • trusted: Restricted user-space data; accessible only by CAP_SYS_ADMIN.
    • user: Standard namespace for regular processes. Requires read/write file permissions and applies exclusively to regular files (not symbolic links or device nodes).
  • Operations: (Available in path, link l, and file descriptor f variants)
    • getxattr: Retrieves an attribute’s value. Passing a buffer size of 0 returns the necessary buffer size without copying data.
    • setxattr: Assigns a value to a key. Flags include XATTR_CREATE (fails if the key exists) and XATTR_REPLACE (fails if the key is undefined).
    • listxattr: Enumerates all assigned keys, returning them as consecutive null-terminated strings.
    • removexattr: Completely removes an extended attribute, distinct from assigning a zero-length value.

The abstraction of files and their extended attributes relies heavily on directories, which map human-readable names to the underlying inodes.

Directories and the Current Working Directory

A directory is a file containing a list of directory entries, where each entry is a name mapping to a specific inode number.

  • Structure:
    • Every directory inherently contains . (a self-reference) and .. (a reference to its parent directory).
    • Pathnames are either absolute (starting from the root directory /) or relative (starting from the current working directory).
  • The Current Working Directory (CWD):
    • getcwd(buf, size) retrieves the absolute CWD path. As a Linux-specific extension, passing a NULL buffer and size 0 dynamically allocates the required memory string.
    • chdir(path) and fchdir(fd) change the CWD. fchdir is substantially faster as the kernel utilizes an already resolved inode rather than executing a full pathname traversal.
  • Creation and Removal:
    • mkdir(path, mode) creates a directory. The final permissions are constrained by the process’s umask (mode & ~umask & 01777).
    • rmdir(path) deletes an empty directory. No native system call recursively deletes populated directories.
  • Reading Directory Contents:
    • opendir(name) opens a directory stream, allocating a DIR object.
    • readdir(dir) sequentially yields a struct dirent for each entry. The structure provides d_name (the filename) and d_ino (the inode).
    • closedir(dir) frees the stream and the backing file descriptor.

Directories function fundamentally as collections of links, which exist in both hard and soft variants to map names across the filesystem hierarchy.

Directory entries mapping a filename to an inode are formally known as links. The kernel tracks file lifecycles based on these link configurations.

  • Hard Links:
    • Multiple distinct directory entries mapping to the exact same inode.
    • Because inodes are filesystem-specific, hard links cannot span across different filesystems.
    • link(oldpath, newpath) creates a new hard link. All hard links share equal status; no link is the “original.”
    • The kernel maintains a link count (number of directory entries) and a usage count (number of open file descriptors) for each inode. Disk blocks are freed only when both counts reach 0.
  • Symbolic (Soft) Links:
    • Special standalone files that store the target’s pathname as their data payload.
    • Evaluated dynamically at runtime, allowing them to span filesystems or point to nonexistent targets (dangling symlinks).
    • symlink(oldpath, newpath) generates a soft link.
  • Unlinking:
    • unlink(pathname) deletes a name from the filesystem. If executed on a symbolic link, the link is destroyed, preserving the target.
    • remove(path) acts as a unified C library interface, invoking unlink() for files and rmdir() for directories.

Manipulating file links alters their location and duplication in the hierarchy, forming the basis for high-level copying and moving operations.

Copying and Moving Files

Files are duplicated and relocated by manipulating their underlying data blocks and directory link entries.

  • Copying:
    • Unix provides no native system call for copying files.
    • User-space utilities (e.g., cp) manually execute sequences of open(), read(), write(), and close() to duplicate file contents into a new inode.
  • Moving:
    • rename(oldpath, newpath) relocates a file by updating directory link entries without modifying the file’s inode or data.
    • Both paths must reside on the same filesystem (otherwise yielding EXDEV). Utilities fallback to a copy-and-unlink sequence across filesystems.
    • If newpath is an existing file, it is atomically overwritten. If it is an existing directory, it must be empty, or the call fails with ENOTEMPTY.

Files in the hierarchy are not limited to standard storage mechanisms; they also represent physical hardware interfaces through special device nodes.

Device Nodes and Out-of-Band Communication

Device nodes are special files bridging user space with kernel-level device drivers via major and minor identifying numbers.

  • Special Device Nodes:
    • /dev/null: Discards all write operations; returns EOF on read operations.
    • /dev/zero: Discards all write operations; generates an infinite stream of null bytes (\0) on read operations.
    • /dev/full: Generates null bytes on read operations; triggers ENOSPC (No space left on device) on all write operations.
    • /dev/random and /dev/urandom: Interface with the kernel’s entropy pool. /dev/random blocks when entropy is depleted, while /dev/urandom seamlessly falls back to pseudo-random generation to avoid blocking.
  • Out-of-Band Communication (ioctl):
    • ioctl(fd, request, ...) communicates hardware-specific instructions outside the primary byte stream.
    • Requests (e.g., CDROMEJECT) are intercepted by the kernel and routed directly to the specific device driver associated with the file descriptor.

While ioctl facilitates direct device control, monitoring standard file and directory I/O operations requires a dedicated asynchronous notification subsystem.

Monitoring File Events (inotify)

The inotify subsystem pushes real-time file event notifications to user-space applications, eliminating the overhead and race conditions associated with constant directory polling.

  • Initialization:
    • inotify_init1(flags) creates a new inotify instance and returns a file descriptor to monitor.
  • Watches and Masks:
    • inotify_add_watch(fd, path, mask) assigns a watch descriptor (wd) to track specific events on a file or directory.
    • The mask is a binary-OR of flags such as IN_ACCESS (reads), IN_MODIFY (writes), IN_CREATE, IN_DELETE, and IN_MOVED_FROM/IN_MOVED_TO.
    • inotify_rm_watch(fd, wd) halts monitoring, which triggers an IN_IGNORED cleanup event.
  • Event Handling:
    • Processes read() from the inotify file descriptor, slurping events into struct inotify_event records.
    • struct inotify_event incorporates a zero-length array for the name field, utilized only when an event occurs on a file inside a watched directory.
    • Because name uses variable-length padding, iteration must advance by sizeof(struct inotify_event) + event->len.
    • The cookie field tracks related operations, uniquely linking IN_MOVED_FROM and IN_MOVED_TO events to trace file renames.
    • Issuing ioctl with the FIONREAD command on the inotify descriptor returns the exact queue size in bytes, optimizing read buffering.

By integrating real-time event monitoring with the foundational structures of inodes, extended attributes, and directory hierarchies, the system achieves a responsive, fully managed file and directory ecosystem.