Additional capabilities compared to the inotify(7) API include the ability to monitor all of the objects in a mounted filesystem, the ability to make access permission decisions, and the possibility to read or modify files before access by other applications.
The following system calls are used with this API: fanotify_init(2), fanotify_mark(2), read(2), write(2), and close(2).
An fanotify notification group is a kernel-internal object that holds a list of files, directories, filesystems, and mounts for which events shall be created.
For each entry in an fanotify notification group, two bit masks exist: the mark mask and the ignore mask. The mark mask defines file activities for which an event shall be created. The ignore mask defines activities for which no event shall be generated. Having these two types of masks permits a filesystem, mount, or directory to be marked for receiving events, while at the same time ignoring events for specific objects under a mount or directory.
The fanotify_mark(2) system call adds a file, directory, filesystem, or mount to a notification group and specifies which events shall be reported (or ignored), or removes or modifies such an entry.
A possible usage of the ignore mask is for a file cache. Events of interest for a file cache are modification of a file and closing of the same. Hence, the cached directory or mount is to be marked to receive these events. After receiving the first event informing that a file has been modified, the corresponding cache entry will be invalidated. No further modification events for this file are of interest until the file is closed. Hence, the modify event can be added to the ignore mask. Upon receiving the close event, the modify event can be removed from the ignore mask and the file cache entry can be updated.
The entries in the fanotify notification groups refer to files and directories via their inode number and to mounts via their mount ID. If files or directories are renamed or moved within the same mount, the respective entries survive. If files or directories are deleted or moved to another mount or if filesystems or mounts are unmounted, the corresponding entries are deleted.
Two types of events are generated: notification events and permission events. Notification events are merely informative and require no action to be taken by the receiving application with one exception: if a valid file descriptor is provided within a generic event, the file descriptor must be closed. Permission events are requests to the receiving application to decide whether permission for a file access shall be granted. For these events, the recipient must write a response which decides whether access is granted or not.
An event is removed from the event queue of the fanotify group when it has been read. Permission events that have been read are kept in an internal list of the fanotify group until either a permission decision has been taken by writing to the fanotify file descriptor or the fanotify file descriptor is closed.
After a successful read(2), the read buffer contains one or more of the following structures:
struct fanotify_event_metadata {
__u32 event_len;
__u8 vers;
__u8 reserved;
__u16 metadata_len;
__aligned_u64 mask;
__s32 fd;
__s32 pid;
};
Information records are supplemental pieces of information that may be provided alongside the generic fanotify_event_metadata structure. The flags passed to fanotify_init(2) have influence over the type of information records that may be returned for an event. For example, if a notification group is initialized with FAN_REPORT_FID or FAN_REPORT_DIR_FID, then event listeners should also expect to receive a fanotify_event_info_fid structure alongside the fanotify_event_metadata structure, whereby file handles are used to identify filesystem objects rather than file descriptors. Information records may also be stacked, meaning that using the various FAN_REPORT_* flags in conjunction with one another is supported. In such cases, multiple information records can be returned for an event alongside the generic fanotify_event_metadata structure. For example, if a notification group is initialized with FAN_REPORT_TARGET_FID and FAN_REPORT_PIDFD, then an event listener should expect to receive up to two fanotify_event_info_fid information records and one fanotify_event_info_pidfd information record alongside the generic fanotify_event_metadata structure. Importantly, fanotify provides no guarantee around the ordering of information records when a notification group is initialized with a stacked based configuration. Each information record has a nested structure of type fanotify_event_info_header. It is imperative for event listeners to inspect the info_type field of this structure in order to determine the type of information record that had been received for a given event.
In cases where an fanotify group identifies filesystem objects by file handles, event listeners should also expect to receive one or more of the below information record objects alongside the generic fanotify_event_metadata structure within the read buffer:
struct fanotify_event_info_fid {
struct fanotify_event_info_header hdr;
__kernel_fsid_t fsid;
unsigned char file_handle[0];
};
In cases where an fanotify group is initialized with FAN_REPORT_PIDFD, event listeners should expect to receive the below information record object alongside the generic fanotify_event_metadata structure within the read buffer:
struct fanotify_event_info_pidfd {
struct fanotify_event_info_header hdr;
__s32 pidfd;
};
In case of a FAN_FS_ERROR event, an additional information record describing the error that occurred is returned alongside the generic fanotify_event_metadata structure within the read buffer. This structure is defined as follows:
struct fanotify_event_info_error {
struct fanotify_event_info_header hdr;
__s32 error;
__u32 error_count;
};
All information records contain a nested structure of type fanotify_event_info_header. This structure holds meta-information about the information record that may have been returned alongside the generic fanotify_event_metadata structure. This structure is defined as follows:
struct fanotify_event_info_header {
__u8 info_type;
__u8 pad;
__u16 len;
};
For performance reasons, it is recommended to use a large buffer size (for example, 4096 bytes), so that multiple events can be retrieved by a single read(2).
The return value of read(2) is the number of bytes placed in the buffer, or -1 in case of an error (but see BUGS).
The fields of the fanotify_event_metadata structure are as follows:
A program listening to fanotify events can compare this PID to the PID returned by getpid(2), to determine whether the event is caused by the listener itself, or is due to a file access by another process.
The bit mask in mask indicates which events have occurred for a single filesystem object. Multiple bits may be set in this mask, if more than one event occurred for the monitored filesystem object. In particular, consecutive events for the same filesystem object and originating from the same process may be merged into a single event, with the exception that two permission events are never merged into one queue entry.
The bits that may appear in mask are as follows:
To check for any close event, the following bit mask may be used:
To check for any move event, the following bit mask may be used:
The following bits may appear in mask only in conjunction with other event type bits:
Information records that are supplied alongside the generic fanotify_event_metadata structure will always contain a nested structure of type fanotify_event_info_header. The fields of the fanotify_event_info_header are as follows:
The fields of the fanotify_event_info_fid structure are as follows:
The fields of the fanotify_event_info_pidfd structure are as follows:
The fields of the fanotify_event_info_error structure are as follows:
The following macros are provided to iterate over a buffer containing fanotify event metadata returned by a read(2) from an fanotify file descriptor:
In addition, there is:
struct fanotify_response {
__s32 fd;
__u32 response;
};
The fields of this structure are as follows:
If access is denied, the requesting application call will receive an EPERM error. Additionally, if the notification group has been created with the FAN_ENABLE_AUDIT flag, then the FAN_AUDIT flag can be set in the response field. In that case, the audit subsystem will log information about the access decision to the audit logs.
Errors reported by FAN_FS_ERROR are generic errno values, but not all kinds of error types are reported by all filesystems.
Errors not directly related to a file (i.e. super block corruption) are reported with an invalid file_handle. For these errors, the file_handle will have the field handle_type set to FILEID_INVALID, and the handle buffer size set to 0.
Since Linux 5.13, the following interfaces can be used to control the amount of kernel resources consumed by fanotify:
In addition to the usual errors for write(2), the following errors can occur when writing to the fanotify file descriptor:
The fanotify API does not report file accesses and modifications that may occur because of mmap(2), msync(2), and munmap(2).
Events for directories are created only if the directory itself is opened, read, and closed. Adding, removing, or changing children of a marked directory does not create events for the monitored directory itself.
Fanotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional marks must be created. The FAN_CREATE event can be used for detecting when a subdirectory has been created under a marked directory. An additional mark must then be set on the newly created subdirectory. This approach is racy, because it can lose events that occurred inside the newly created subdirectory, before a mark is added on that subdirectory. Monitoring mounts offers the capability to monitor a whole directory tree in a race-free manner. Monitoring filesystems offers the capability to monitor changes made from any mount of a filesystem instance in a race-free manner.
The event queue can overflow. In this case, events are lost.
As of Linux 3.17, the following bugs exist:
The following shell session shows an example of running this program. This session involved editing the file /home/user/temp/notes. Before the file was opened, a FAN_OPEN_PERM event occurred. After the file was closed, a FAN_CLOSE_WRITE event occurred. Execution of the program ends when the user presses the ENTER key.
# ./fanotify_example /home Press enter key to terminate. Listening for events. FAN_OPEN_PERM: File /home/user/temp/notes FAN_CLOSE_WRITE: File /home/user/temp/notes
/* Read all available fanotify events from the file descriptor 'fd'. */
static void
handle_events(int fd)
{
const struct fanotify_event_metadata *metadata;
struct fanotify_event_metadata buf[200];
ssize_t len;
char path[PATH_MAX];
ssize_t path_len;
char procfd_path[PATH_MAX];
struct fanotify_response response;
/* Loop while events can be read from fanotify file descriptor. */
for (;;) {
/* Read some events. */
len = read(fd, buf, sizeof(buf));
if (len == -1 && errno != EAGAIN) {
perror("read");
exit(EXIT_FAILURE);
}
/* Check if end of available data reached. */
if (len <= 0)
break;
/* Point to the first event in the buffer. */
metadata = buf;
/* Loop over all events in the buffer. */
while (FAN_EVENT_OK(metadata, len)) {
/* Check that run-time and compile-time structures match. */
if (metadata->vers != FANOTIFY_METADATA_VERSION) {
fprintf(stderr,
"Mismatch of fanotify metadata version.\n");
exit(EXIT_FAILURE);
}
/* metadata->fd contains either FAN_NOFD, indicating a
queue overflow, or a file descriptor (a nonnegative
integer). Here, we simply ignore queue overflow. */
if (metadata->fd >= 0) {
/* Handle open permission event. */
if (metadata->mask & FAN_OPEN_PERM) {
printf("FAN_OPEN_PERM: ");
/* Allow file to be opened. */
response.fd = metadata->fd;
response.response = FAN_ALLOW;
write(fd, &response, sizeof(response));
}
/* Handle closing of writable file event. */
if (metadata->mask & FAN_CLOSE_WRITE)
printf("FAN_CLOSE_WRITE: ");
/* Retrieve and print pathname of the accessed file. */
snprintf(procfd_path, sizeof(procfd_path),
"/proc/self/fd/%d", metadata->fd);
path_len = readlink(procfd_path, path,
sizeof(path) - 1);
if (path_len == -1) {
perror("readlink");
exit(EXIT_FAILURE);
}
path[path_len] = '\0';
printf("File %s\n", path);
/* Close the file descriptor of the event. */
close(metadata->fd);
}
/* Advance to next event. */
metadata = FAN_EVENT_NEXT(metadata, len);
}
}
}
int
main(int argc, char *argv[])
{
char buf;
int fd, poll_num;
nfds_t nfds;
struct pollfd fds[2];
/* Check mount point is supplied. */
if (argc != 2) {
fprintf(stderr, "Usage: %s MOUNT\n", argv[0]);
exit(EXIT_FAILURE);
}
printf("Press enter key to terminate.\n");
/* Create the file descriptor for accessing the fanotify API. */
fd = fanotify_init(FAN_CLOEXEC | FAN_CLASS_CONTENT | FAN_NONBLOCK,
O_RDONLY | O_LARGEFILE);
if (fd == -1) {
perror("fanotify_init");
exit(EXIT_FAILURE);
}
/* Mark the mount for:
- permission events before opening files
- notification events after closing a write-enabled
file descriptor. */
if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT,
FAN_OPEN_PERM | FAN_CLOSE_WRITE, AT_FDCWD,
argv[1]) == -1) {
perror("fanotify_mark");
exit(EXIT_FAILURE);
}
/* Prepare for polling. */
nfds = 2;
fds[0].fd = STDIN_FILENO; /* Console input */
fds[0].events = POLLIN;
fds[1].fd = fd; /* Fanotify input */
fds[1].events = POLLIN;
/* This is the loop to wait for incoming events. */
printf("Listening for events.\n");
while (1) {
poll_num = poll(fds, nfds, -1);
if (poll_num == -1) {
if (errno == EINTR) /* Interrupted by a signal */
continue; /* Restart poll() */
perror("poll"); /* Unexpected error */
exit(EXIT_FAILURE);
}
if (poll_num > 0) {
if (fds[0].revents & POLLIN) {
/* Console input is available: empty stdin and quit. */
while (read(STDIN_FILENO, &buf, 1) > 0 && buf != '\n')
continue;
break;
}
if (fds[1].revents & POLLIN) {
/* Fanotify events are available. */
handle_events(fd);
}
}
}
printf("Listening for events stopped.\n");
exit(EXIT_SUCCESS);
}
The following shell sessions show two different invocations of this program, with different actions performed on a watched object.
The first session shows a mark being placed on /home/user. This is followed by the creation of a regular file, /home/user/testfile.txt. This results in a FAN_CREATE event being generated and reported against the file's parent watched directory object and with the created file name. Program execution ends once all events captured within the buffer have been processed.
# ./fanotify_fid /home/user
Listening for events.
FAN_CREATE (file created):
Directory /home/user has been modified.
Entry 'testfile.txt' is not a subdirectory.
All events processed successfully. Program exiting.
$ touch /home/user/testfile.txt # In another terminal
The second session shows a mark being placed on /home/user. This is followed by the creation of a directory, /home/user/testdir. This specific action results in a FAN_CREATE event being generated and is reported with the FAN_ONDIR flag set and with the created directory name.
# ./fanotify_fid /home/user
Listening for events.
FAN_CREATE | FAN_ONDIR (subdirectory created):
Directory /home/user has been modified.
Entry 'testdir' is a subdirectory.
All events processed successfully. Program exiting.
$ mkdir -p /home/user/testdir # In another terminal
#define BUF_SIZE 256
int
main(int argc, char *argv[])
{
int fd, ret, event_fd, mount_fd;
ssize_t len, path_len;
char path[PATH_MAX];
char procfd_path[PATH_MAX];
char events_buf[BUF_SIZE];
struct file_handle *file_handle;
struct fanotify_event_metadata *metadata;
struct fanotify_event_info_fid *fid;
const char *file_name;
struct stat sb;
if (argc != 2) {
fprintf(stderr, "Invalid number of command line arguments.\n");
exit(EXIT_FAILURE);
}
mount_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
if (mount_fd == -1) {
perror(argv[1]);
exit(EXIT_FAILURE);
}
/* Create an fanotify file descriptor with FAN_REPORT_DFID_NAME as
a flag so that program can receive fid events with directory
entry name. */
fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_DFID_NAME, 0);
if (fd == -1) {
perror("fanotify_init");
exit(EXIT_FAILURE);
}
/* Place a mark on the filesystem object supplied in argv[1]. */
ret = fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_ONLYDIR,
FAN_CREATE | FAN_ONDIR,
AT_FDCWD, argv[1]);
if (ret == -1) {
perror("fanotify_mark");
exit(EXIT_FAILURE);
}
printf("Listening for events.\n");
/* Read events from the event queue into a buffer. */
len = read(fd, events_buf, sizeof(events_buf));
if (len == -1 && errno != EAGAIN) {
perror("read");
exit(EXIT_FAILURE);
}
/* Process all events within the buffer. */
for (metadata = (struct fanotify_event_metadata *) events_buf;
FAN_EVENT_OK(metadata, len);
metadata = FAN_EVENT_NEXT(metadata, len)) {
fid = (struct fanotify_event_info_fid *) (metadata + 1);
file_handle = (struct file_handle *) fid->handle;
/* Ensure that the event info is of the correct type. */
if (fid->hdr.info_type == FAN_EVENT_INFO_TYPE_FID ||
fid->hdr.info_type == FAN_EVENT_INFO_TYPE_DFID) {
file_name = NULL;
} else if (fid->hdr.info_type == FAN_EVENT_INFO_TYPE_DFID_NAME) {
file_name = file_handle->f_handle +
file_handle->handle_bytes;
} else {
fprintf(stderr, "Received unexpected event info type.\n");
exit(EXIT_FAILURE);
}
if (metadata->mask == FAN_CREATE)
printf("FAN_CREATE (file created):\n");
if (metadata->mask == (FAN_CREATE | FAN_ONDIR))
printf("FAN_CREATE | FAN_ONDIR (subdirectory created):\n");
/* metadata->fd is set to FAN_NOFD when the group identifies
objects by file handles. To obtain a file descriptor for
the file object corresponding to an event you can use the
struct file_handle that's provided within the
fanotify_event_info_fid in conjunction with the
open_by_handle_at(2) system call. A check for ESTALE is
done to accommodate for the situation where the file handle
for the object was deleted prior to this system call. */
event_fd = open_by_handle_at(mount_fd, file_handle, O_RDONLY);
if (event_fd == -1) {
if (errno == ESTALE) {
printf("File handle is no longer valid. "
"File has been deleted\n");
continue;
} else {
perror("open_by_handle_at");
exit(EXIT_FAILURE);
}
}
snprintf(procfd_path, sizeof(procfd_path), "/proc/self/fd/%d",
event_fd);
/* Retrieve and print the path of the modified dentry. */
path_len = readlink(procfd_path, path, sizeof(path) - 1);
if (path_len == -1) {
perror("readlink");
exit(EXIT_FAILURE);
}
path[path_len] = '\0';
printf("\tDirectory '%s' has been modified.\n", path);
if (file_name) {
ret = fstatat(event_fd, file_name, &sb, 0);
if (ret == -1) {
if (errno != ENOENT) {
perror("fstatat");
exit(EXIT_FAILURE);
}
printf("\tEntry '%s' does not exist.\n", file_name);
} else if ((sb.st_mode & S_IFMT) == S_IFDIR) {
printf("\tEntry '%s' is a subdirectory.\n", file_name);
} else {
printf("\tEntry '%s' is not a subdirectory.\n",
file_name);
}
}
/* Close associated file descriptor for this event. */
close(event_fd);
}
printf("All events processed successfully. Program exiting.\n");
exit(EXIT_SUCCESS);
}