File and Directory Management
Files and Their Metadata
Files in a Unix system are referenced by an inode, which acts as a conceptual entity and physical object storing file metadata. Inodes are addressed by filesystem-unique numerical values known as inode numbers. The inode stores access permissions, timestamps, owner, group, and the location of the file’s data, but explicitly does not store the filename itself.
- The
statFamily: A suite of system calls used to retrieve a file’s inode metadata.stat(path, buf): Retrieves metadata for the file resolved by the given path.fstat(fd, buf): Retrieves metadata for an open file descriptor.lstat(path, buf): Retrieves metadata for a symbolic link itself, bypassing the link’s target.
- The
statStructure: Populated by thestatcalls, containing the following core fields:st_dev: Device node ID where the file resides (evaluates to 0 for virtual/network filesystems).st_ino: The file’s filesystem-unique inode number.st_mode: File type and access permissions.st_nlink: Number of hard links pointing to this inode.st_uid/st_gid: The User ID and Group ID of the file’s owner.st_rdev: The device ID, utilized only if the file is a special device node.st_size: Total file size in bytes.st_blksize: Optimal block size for user-buffered I/O.st_blocks: Number of physical blocks allocated (often smaller than expected if the file contains sparse holes).st_atime,st_mtime,st_ctime: Timestamps for last access, last modification (data write), and last status change (metadata alteration), respectively.
Manipulating a file’s metadata requires specific system calls to alter the permission and ownership attributes defined within this stat structure.
Permissions and Ownership
File access is governed by permissions and ownership rules evaluated against the process’s effective User ID (UID) and Group ID (GID).
- Permissions:
chmod(path, mode)andfchmod(fd, mode)modify file permission bits.- The
modeparameter accepts binary-ORed POSIX constants (e.g.,S_IRUSR | S_IWUSR). - Modifying permissions requires the calling process’s effective UID to match the file’s owner, or the process must possess the
CAP_FOWNERcapability.
- Ownership:
chown(path, owner, group)changes ownership, automatically following symbolic links to alter the target.lchown(path, owner, group)alters the ownership of a symbolic link directly.fchown(fd, owner, group)alters ownership via an open file descriptor.- Passing
-1as the owner or group argument leaves that specific attribute unchanged. - Modifying the file owner requires the
CAP_CHOWNcapability, typically restricted to the root user.
Beyond standard permissions and ownership mapped in the inode, filesystems support arbitrary, user-defined metadata via extended attributes.
Extended Attributes (xattrs)
Extended attributes associate arbitrary key/value pairs with files, enabling feature additions like Access Control Lists (ACLs) or mandatory access controls without restructuring the underlying filesystem.
- Keys and Values:
- Keys are valid UTF-8 strings formatted as
namespace.attribute(e.g.,user.mime_type). - Values are arbitrary byte arrays. They are not guaranteed to be null-terminated strings, requiring explicit size tracking during I/O operations.
- Keys are valid UTF-8 strings formatted as
- Namespaces: The prefix of the key dictates access policy.
system: Implements kernel features like ACLs; highly restricted access.security: Implements security modules like SELinux; writable only byCAP_SYS_ADMIN.trusted: Restricted user-space data; accessible only byCAP_SYS_ADMIN.user: Standard namespace for regular processes. Requires read/write file permissions and applies exclusively to regular files (not symbolic links or device nodes).
- Operations: (Available in
path, linkl, and file descriptorfvariants)getxattr: Retrieves an attribute’s value. Passing a buffer size of 0 returns the necessary buffer size without copying data.setxattr: Assigns a value to a key. Flags includeXATTR_CREATE(fails if the key exists) andXATTR_REPLACE(fails if the key is undefined).listxattr: Enumerates all assigned keys, returning them as consecutive null-terminated strings.removexattr: Completely removes an extended attribute, distinct from assigning a zero-length value.
The abstraction of files and their extended attributes relies heavily on directories, which map human-readable names to the underlying inodes.
Directories and the Current Working Directory
A directory is a file containing a list of directory entries, where each entry is a name mapping to a specific inode number.
- Structure:
- Every directory inherently contains
.(a self-reference) and..(a reference to its parent directory). - Pathnames are either absolute (starting from the root directory
/) or relative (starting from the current working directory).
- Every directory inherently contains
- The Current Working Directory (CWD):
getcwd(buf, size)retrieves the absolute CWD path. As a Linux-specific extension, passing aNULLbuffer and size0dynamically allocates the required memory string.chdir(path)andfchdir(fd)change the CWD.fchdiris substantially faster as the kernel utilizes an already resolved inode rather than executing a full pathname traversal.
- Creation and Removal:
mkdir(path, mode)creates a directory. The final permissions are constrained by the process’s umask (mode & ~umask & 01777).rmdir(path)deletes an empty directory. No native system call recursively deletes populated directories.
- Reading Directory Contents:
opendir(name)opens a directory stream, allocating aDIRobject.readdir(dir)sequentially yields astruct direntfor each entry. The structure providesd_name(the filename) andd_ino(the inode).closedir(dir)frees the stream and the backing file descriptor.
Directories function fundamentally as collections of links, which exist in both hard and soft variants to map names across the filesystem hierarchy.
Links and Unlinking
Directory entries mapping a filename to an inode are formally known as links. The kernel tracks file lifecycles based on these link configurations.
- Hard Links:
- Multiple distinct directory entries mapping to the exact same inode.
- Because inodes are filesystem-specific, hard links cannot span across different filesystems.
link(oldpath, newpath)creates a new hard link. All hard links share equal status; no link is the “original.”- The kernel maintains a link count (number of directory entries) and a usage count (number of open file descriptors) for each inode. Disk blocks are freed only when both counts reach 0.
- Symbolic (Soft) Links:
- Special standalone files that store the target’s pathname as their data payload.
- Evaluated dynamically at runtime, allowing them to span filesystems or point to nonexistent targets (dangling symlinks).
symlink(oldpath, newpath)generates a soft link.
- Unlinking:
unlink(pathname)deletes a name from the filesystem. If executed on a symbolic link, the link is destroyed, preserving the target.remove(path)acts as a unified C library interface, invokingunlink()for files andrmdir()for directories.
Manipulating file links alters their location and duplication in the hierarchy, forming the basis for high-level copying and moving operations.
Copying and Moving Files
Files are duplicated and relocated by manipulating their underlying data blocks and directory link entries.
- Copying:
- Unix provides no native system call for copying files.
- User-space utilities (e.g.,
cp) manually execute sequences ofopen(),read(),write(), andclose()to duplicate file contents into a new inode.
- Moving:
rename(oldpath, newpath)relocates a file by updating directory link entries without modifying the file’s inode or data.- Both paths must reside on the same filesystem (otherwise yielding
EXDEV). Utilities fallback to a copy-and-unlink sequence across filesystems. - If
newpathis an existing file, it is atomically overwritten. If it is an existing directory, it must be empty, or the call fails withENOTEMPTY.
Files in the hierarchy are not limited to standard storage mechanisms; they also represent physical hardware interfaces through special device nodes.
Device Nodes and Out-of-Band Communication
Device nodes are special files bridging user space with kernel-level device drivers via major and minor identifying numbers.
- Special Device Nodes:
/dev/null: Discards all write operations; returns EOF on read operations./dev/zero: Discards all write operations; generates an infinite stream of null bytes (\0) on read operations./dev/full: Generates null bytes on read operations; triggersENOSPC(No space left on device) on all write operations./dev/randomand/dev/urandom: Interface with the kernel’s entropy pool./dev/randomblocks when entropy is depleted, while/dev/urandomseamlessly falls back to pseudo-random generation to avoid blocking.
- Out-of-Band Communication (
ioctl):ioctl(fd, request, ...)communicates hardware-specific instructions outside the primary byte stream.- Requests (e.g.,
CDROMEJECT) are intercepted by the kernel and routed directly to the specific device driver associated with the file descriptor.
While ioctl facilitates direct device control, monitoring standard file and directory I/O operations requires a dedicated asynchronous notification subsystem.
Monitoring File Events (inotify)
The inotify subsystem pushes real-time file event notifications to user-space applications, eliminating the overhead and race conditions associated with constant directory polling.
- Initialization:
inotify_init1(flags)creates a new inotify instance and returns a file descriptor to monitor.
- Watches and Masks:
inotify_add_watch(fd, path, mask)assigns a watch descriptor (wd) to track specific events on a file or directory.- The
maskis a binary-OR of flags such asIN_ACCESS(reads),IN_MODIFY(writes),IN_CREATE,IN_DELETE, andIN_MOVED_FROM/IN_MOVED_TO. inotify_rm_watch(fd, wd)halts monitoring, which triggers anIN_IGNOREDcleanup event.
- Event Handling:
- Processes
read()from the inotify file descriptor, slurping events intostruct inotify_eventrecords. struct inotify_eventincorporates a zero-length array for thenamefield, utilized only when an event occurs on a file inside a watched directory.- Because
nameuses variable-length padding, iteration must advance bysizeof(struct inotify_event) + event->len. - The
cookiefield tracks related operations, uniquely linkingIN_MOVED_FROMandIN_MOVED_TOevents to trace file renames. - Issuing
ioctlwith theFIONREADcommand on the inotify descriptor returns the exact queue size in bytes, optimizing read buffering.
- Processes
By integrating real-time event monitoring with the foundational structures of inodes, extended attributes, and directory hierarchies, the system achieves a responsive, fully managed file and directory ecosystem.