The Virtual Filesystem

The Virtual Filesystem (VFS) is a kernel subsystem that implements file and filesystem-related interfaces for user-space programs. It acts as an abstraction layer, enabling standard system calls like open(), read(), and write() to operate interchangeably across diverse filesystems and physical media. By defining a common file model containing basic conceptual interfaces and data structures, the VFS hides underlying implementation details. User-space system calls invoke generic VFS methods, which in turn map to filesystem-specific backend implementations.

The VFS relies on and codifies the core paradigms established by traditional Unix filesystems.

Unix Filesystem Concepts

  • Filesystem: A hierarchical storage of data and control information mounted at a specific point in a global, unified namespace.
  • File: An ordered string of bytes assigned a human-readable name.
  • Directory: A regular file that lists contained files and subdirectories. Each component within a directory path is a directory entry.
  • Inode (Index Node): A data structure storing file metadata, such as access permissions, size, owner, and creation time.
  • Superblock: A central data structure containing filesystem metadata and control information.
  • Non-Unix Filesystems: Filesystems lacking inherent Unix structures (like FAT or NTFS) must assemble and simulate these concepts in memory to interface with the VFS.

These foundational concepts are instantiated by the VFS into distinct, object-oriented C structures.

VFS Objects and Their Data Structures

The VFS utilizes C structures containing both data and pointers to filesystem-implemented functions, simulating an object-oriented paradigm.

  • Primary Objects:
    • Superblock Object: Represents a specific mounted filesystem.
    • Inode Object: Represents a specific file.
    • Dentry Object: Represents a directory entry, forming a single component of a path.
    • File Object: Represents an open file associated with a process.
  • Operations Objects: Each primary object contains an operations object (super_operations, inode_operations, dentry_operations, file_operations) detailing the executable methods the kernel can invoke.

At the highest level of the filesystem hierarchy resides the superblock object, dictating global filesystem state.

The Superblock Object

  • Purpose: Stores control information describing a specific mounted filesystem.
  • Structure: Represented by struct super_block (defined in <linux/fs.h>).
  • Initialization: Created and populated via alloc_super() when the filesystem is mounted and its control block is read from disk.
  • Superblock Operations (struct super_operations):
    • alloc_inode() / destroy_inode(): Creates and initializes, or deallocates, an inode object under the given superblock.
    • write_super() / sync_fs(): Synchronizes modified in-memory superblock or filesystem metadata with the on-disk backing store.
    • dirty_inode(): Invoked by the VFS when an inode is modified; utilized by journaling filesystems to perform journal updates.

While the superblock manages the filesystem globally, individual files and directories are physically represented by inode objects.

The Inode Object

  • Purpose: Contains all metadata and information required by the kernel to manipulate a file or directory.
  • Structure: Represented by struct inode (defined in <linux/fs.h>).
  • Instantiation: Constructed in memory as files are accessed.
  • Special Files: Uses a union to hold pointers for special file types, accommodating i_pipe (pipes), i_bdev (block devices), or i_cdev (character devices).
  • Inode Operations (struct inode_operations):
    • create(): Invoked by the open() and creat() system calls to build a new inode associated with a specific dentry.
    • lookup(): Searches a directory for an inode corresponding to a specified filename.
    • link(), unlink(), symlink(): Manages the creation and deletion of hard and symbolic links.
    • mkdir(), rmdir(), mknod(): Handles creation and deletion of directories and special files.
    • truncate(): Modifies the size of a given file.

Inodes manage the physical file properties, but the VFS resolves and navigates the hierarchical namespace using dentry objects.

The Dentry Object and Dcache

  • Purpose: Represents a specific component in a path (e.g., /, bin, and vi in the path /bin/vi). Unlike inodes, dentries are generated on-the-fly from string path names and do not correspond to on-disk data structures.
  • Structure: Represented by struct dentry (defined in <linux/dcache.h>).
  • Dentry States:
    • Used: Associated with a valid inode (d_inode points to an inode) and currently in use by the VFS (d_count > 0). Cannot be discarded.
    • Unused: Associated with a valid inode, but not currently in active use (d_count == 0). Cached for rapid future lookups, but discardable under memory pressure.
    • Negative: Not associated with a valid inode (d_inode == NULL). Cached to quickly reject invalid subsequent lookups.
  • The Dentry Cache (dcache):
    • Bypasses expensive filesystem walks by caching resolved path components.
    • Operates via three mechanisms: lists of used dentries linked to their inode, a doubly linked LRU list of unused/negative dentries, and a hash table for rapid resolution.
    • Acts as a frontend to the inode cache (icache); a cached dentry maintains a positive usage count on its corresponding inode, pinning the inode in memory.
  • Dentry Operations (struct dentry_operations):
    • d_revalidate(): Verifies dentry object validity prior to use from the dcache.
    • d_hash(), d_compare(): Customizes hashing and string comparison logic (e.g., enabling case-insensitive matching for FAT filesystems).

Once the dcache successfully resolves a path to a dentry and its underlying inode, a process can open the file, creating a file object.

The File Object

  • Purpose: The in-memory representation of an open file from the perspective of a process.
  • Lifecycle: Created in response to the open() system call and destroyed by the close() system call.
  • Structure: Represented by struct file (defined in <linux/fs.h>). Contains process-specific tracking like the file access mode, current file offset (f_pos), and an f_path structure holding the dentry.
  • Relationships: Multiple file objects can exist concurrently for the same physical file if opened multiple times, but they all map back to a single unique dentry and inode.
  • File Operations (struct file_operations):
    • read(), write(), aio_read(), aio_write(): Synchronous and asynchronous data transfer functions that update the file pointer upon completion.
    • mmap(): Memory maps the file onto a given address space.
    • unlocked_ioctl(): Replaces the legacy ioctl(), removing reliance on the coarse Big Kernel Lock (BKL) and enforcing driver-level synchronization.
    • compat_ioctl(): Provides a portable interface for 32-bit applications running on 64-bit systems, performing necessary size conversions.

File objects tie user-space processes directly to file data, but the VFS additionally requires overarching structures to track mounted filesystems.

Filesystem and Mount Data Structures

  • Filesystem Type (struct file_system_type):
    • Describes the capabilities and behavior of a specific filesystem variant (e.g., ext3, UDF).
    • Contains the get_sb() method to read the superblock from disk during instantiation.
    • Exists as a single unique instance per filesystem type, irrespective of the number of active mounts.
  • Mount Point (struct vfsmount):
    • Represents a specific, mounted instance of a filesystem.
    • Maintains linked lists mapping the relationship and topology between the current mount, its parent, and its children.
    • Stores mnt_flags enforcing mount-specific parameters, such as MNT_NOSUID (forbidding setuid/setgid execution) or MNT_NOEXEC (forbidding all binary execution).

The global filesystem hierarchy relies on these mount structures, but each process maintains its own localized view of this hierarchy and its open files.

Data Structures Associated with a Process

  • struct files_struct: Pointed to by the process descriptor. Contains all per-process information about open files, including fd_array, an array of pointers to open file objects.
  • struct fs_struct: Pointed to by the process descriptor. Holds path structures dictating the process’s current working directory (pwd) and root directory (root).
  • struct mnt_namespace: Represents a process’s unique view of the mounted filesystem hierarchy. By default, processes inherit their parent’s namespace, but specifying the CLONE_NEWNS flag during process creation provides a distinct, independent copy.
  • Thread Sharing: Threads (processes instantiated with CLONE_FILES or CLONE_FS) share files_struct and fs_struct structures. The VFS relies on atomic reference counting (count fields) to prevent premature structure destruction while active threads remain.