1. Design The goal of SEGMEXEC is to implement the non-executable page feature using the segmentation logic of IA-32 based CPUs. On IA-32 Linux runs in protected mode with paging enabled. This means that for every memory access (be that instruction fetch or normal data access) the CPU will perform a two step address translation. In the first step the logical address decoded from the instruction is translated into a linear (or in another terminology, virtual) address. This translation is done by the segmentation logic whose details are explained in a separate document. While Linux effectively does not use segmentation by creating 0 based and 4 GB limited segments for both code and data accesses (therefore logical addresses are the same as linear addresses), it is possible to set up segments that allow to implement non-executable pages. The basic idea is that we divide the 3 GB userland linear address space into two equal halves and use one to store mappings meant for data access (that is, we define a data segment descriptor to cover the 0-1.5 GB linear address range) and the other for storing mappings for execution (that is, we define a code segment descriptor to cover the 1.5-3 GB linear address range). Since an executable mapping can be used for data accesses as well, we will have to ensure that such mappings are visible in both segments and mirror each other. This setup will then separate data accesses from instruction fetches in the sense that they will hit different linear addresses and therefore allow for control/intervention based on the access type. In particular, if a data-only (and therefore non-executable) mapping is present only in the 0-1.5 GB linear address range, then instruction fetches to the same logical addresses will end up in the 1.5-3 GB linear address range and will raise a page fault hence allow detecting such execution attempts. 2. Implementation The core of SEGMEXEC is vma mirroring which is discussed in a separate document. The mirrors for executable file mappings are set up in do_mmap() (an inline function defined in include/linux/mm.h) except for a special case with RANDEXEC (see separate document). do_mmap() is the one common function called by both userland and kernel originated mapping requests. The special code and data segment descriptors are placed into a new GDT called gdt_table2 in arch/i386/kernel/head.S. The separate GDT is needed for two reasons: first it simplifies the implementation in that the CS/SS selectors used for userland do not have to change, and second, this setup prevents a simple attack that a single GDT setup would be subject to (the retf and other instructions could be abused to break out of the restricted code segment used for SEGMEXEC tasks). Since the GDT stores the userland code/data descriptors which are different for SEGMEXEC tasks, we have to modify the low-level context switching code called __switch_to() in arch/i386/kernel/process.c and the last steps of load_elf_binary() in fs/binfmt_elf.c (where the task is first prepared to execute in userland). The GDT also has APM specific descriptors which are set up at runtime and must be propagated to the second GDT as well (in arch/i386/kernel/apm.c). Finally the GDT stores also the per CPU TSS and LDT descriptors whose content must be synchronized between the two GDTs (in set_tss_desc() and set_ldt_desc() in arch/i386/kernel/traps.c). Since the kernel allows userland to define its own code segment descriptors in the LDT, we have to disallow it since it could be used to break out of the SEGMEXEC specific restricted code segment (the extra checks are in write_ldt() in arch/i386/kernel/ldt.c).