1. Design The goal of MPROTECT is to help prevent the introduction of new executable code into the task's address space. This is accomplished by restricting the mmap() and mprotect() interfaces. The restrictions prevent - creating executable anonymous mappings - creating executable/writable file mappings - making an executable/read-only file mapping writable except for performing relocations on an ET_DYN ELF file (non-PIC shared library) - making a non-executable mapping executable To understand the restrictions consider the writability/executability of a mapping as state information. This state is stored in the vma structure in the vm_flags field and determines whether the given area (and consequently each page covered by it) is currently writable/executable and/or can be made writable/executable (by using mprotect() on the area). The flags that describe each attribute are: VM_WRITE, VM_EXEC, VM_MAYWRITE and VM_MAYEXEC. These four attributes mean that any mapping (vma) can be in 16 different states (for our discussion at least, we ignore the other attributes here), and our goal can be achieved by restricting what state a vma can be in or change to throughout its lifetime. Introducing new executable code into a mapping is impossible in any of the following ('good') states: VM_WRITE VM_MAYWRITE VM_WRITE | VM_MAYWRITE VM_EXEC VM_MAYEXEC VM_EXEC | VM_MAYEXEC In every other state it is either possible to directly write new executable code into the mapping or the mapping can be changed by mprotect() so that it becomes writable/executable. Note that the default kernel behaviour does already prevent certain states (in particular, a mapping cannot have VM_WRITE and VM_EXEC without also having VM_MAYWRITE and VM_MAYEXEC, respectively) so this leaves us with 4 good states: VM_MAYWRITE VM_MAYEXEC VM_WRITE | VM_MAYWRITE VM_EXEC | VM_MAYEXEC Let's see now what kind of mappings the kernel creates and what MPROTECT has to change in them: - anonymous mappings (stack, brk() and mmap() controlled heap): these are created in the VM_WRITE | VM_EXEC | VM_MAYWRITE | VM_MAYEXEC state which is not a good state. Since these mappings have to be writable, we can only change the executable status (this will still break real life applications, see later what could be done about them), MPROTECT simply changes their state to VM_WRITE | VM_MAYWRITE, - shared memory mappings: these are created in the VM_WRITE | VM_MAYWRITE state which is a good state, - file mappings: similarly to anonymous mappings, these can be created in all the bad states (list omitted for brevity), in particular the kernel grants VM_MAYWRITE | VM_MAYEXEC to any mapping regardless of what rights were requested. In order to break as few applications as possible yet still achieve our goal, we decided to use the following states for file mappings: - VM_WRITE | VM_MAYWRITE or VM_MAYWRITE if PROT_WRITE was requested at mmap() time - VM_EXEC | VM_MAYEXEC if PROT_WRITE was not requested. Effectively executable mappings are forced to be non-writable and writable mappings are forced to be non-executable (including the impossibility to change this state during their existence). There is one exception to this which is needed in order for the dynamic linker to be able to perform relocations on the executable segment of non-PIC ELF files. If one can ensure that no such libraries exist on his system (libraries should be PIC anyway), then this exception can be removed. Note that the ET_DYN ELF executables suggested for use under RANDMMAP should also be PIC (for this one needs a PIC version of crt1.o however). The above restrictions ensure that the only way to introduce executable code into a task's address space is by mapping a file into memory while requesting PROT_EXEC as well. For an attacker it means that he has to be able to create/write to a file on the target system before he can mmap() it into the attacked task's address space. There are various ways of preventing/detecting such venues of attack but they are beyond the scope of the PaX project. As mentioned before, the MPROTECT restrictions break existing applications that rely on the bad vma states. Most often this means the non-executable anonymous mappings as they are used for satisfying higher-level memory allocation requests (such as the malloc() family in C) and are assumed to be executable (java, gcc trampolines, etc). One way of allowing such applications to work under MPROTECT would be to extend the mmap() interface and allow setting the VM_MAY* flags to certain states. The following example demonstrates how an application would make use of this change: - mmap(..., PROT_READ | PROT_WRITE | PROT_MAYREAD | PROT_MAYEXEC, ...) - generate code into the above area - mprotect(..., PROT_READ | PROT_EXEC) Note that PROT_EXEC is neither requested nor allowed in the initial mmap() call therefore application programmers are forced to call mprotect() explicitly and hence cannot accidentally violate the MPROTECT policy. 2. Implementation The first two restrictions are implemented in do_mmap_pgoff() and do_brk() in mm/mmap.c while the other two are in sys_mprotect() in mm/mprotect.c (non-PIC ELF libraries are handled by pax_handle_maywrite()). Since MPROTECT makes sense only when non-executable pages are enforced as well, the restrictions are enabled only when either of PAGEEXEC or SEGMEXEC is enabled for the given task. Furthermore some of the restrictions are already meaningful/necessary for enforcing just the non-executables pages, therefore they are applied even if MPROTECT itself is not enabled (but enabling MPROTECT is necessary to complete the feature). The special case of allowing non-PIC ELF relocations is managed by pax_handle_maywrite() in mm/mprotect.c. The logic is quite straightforward, first we verify that the mapping for which PROT_WRITE was requested is a candidate for relocations (it has to be an executable file mapping that has not yet been made writable) then we check that the backing file is an ET_DYN ELF file whose dynamic table has an entry showing the need for text relocations. If it is to be allowed we simply change the mapping state that will have the rest of the do_mprotect() logic allow the request and we also set the VM_MAYNOTWRITE flag that will disallow further PROT_WRITE requests on the mapping.