NuttX Protected Build
Warning
Migrated from : https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Protected+Build
The Traditional “Flat” Build
The traditional NuttX build is a “flat” build. By flat, I mean that when
you build NuttX, you end up with a single “blob” called nuttx
. All of the
components of the build reside in the same address space. All components
of the build can access all other components of the build.
The “Two Pass” Protected Build
The NuttX protected build, on the other hand, is a “two-pass” build and
generates two “blobs”: (1) a separately compiled and linked kernel blob
called, again, nuttx and separately compiled and linked user blob called
in nuttx_user.elf
(in the existing build configurations). The user blob
is created on pass 1 and the kernel blob is created on pass2.
These two make commands are identical:
make
make pass1 pass2
But the second is clearer and I prefer to use it for the protected build. In the second case, the user and kernel blobs are built separately; in the first, the kernel and user blob builds may be intermixed and somewhat confusing. You can also build the kernel and user blobs separately with one of the following commands:
make pass1
make pass2
At the end of the build, there will be several files in the top-level NuttX build directory. From Pass 1:
nuttx_user.elf
. The pass1 user-space ELF filenuttx_user.hex
. The pass1 Intel HEX format file (selected indefconfig
)User.map
. Symbols in the user-space ELF file
From Pass 2:
nuttx
. The pass2 kernel-space ELF filenuttx.hex
. The pass2 Intel HEX file (selected indefconfig
)System.map
. Symbols in the kernel-space ELF file
The Memory Protection Unit
If the MCU supports a Memory Protection Unit (MPU), then the logic within the kernel blob all execute in kernel-mode, i.e., with all privileges. These privileged threads can access all memory, all CPU instructions, and all MCU registers. The logic executing within the user-mode blob, on the other hand, all execute in user-mode with certain restrictions as enforced by the MCU and by the MPU. The MCU may restrict access to certain registers and machine instructions; with the MPU, access to all kernel memory resources are prohibited from the user logic. This includes the kernel blob’s FLASH, .bss/.data storage, and the kernel heap memory.
Advantages of the Protected Build
The advantages of such a protected build are (1) security and (2) modularity. Since the kernel resources are protected, it will be much less likely that a misbehaving task will crash the system or that a wild pointer access will corrupt critical memory. This security also provides a safer environment in which to execute 3rd party software and prevents “snooping” into the kernel memory from the hosted applications.
Modularity is assured because there is a strict control of the exposed kernel interfaces. In the flat build, all symbols are exposed and there is no enforcement of a kernel API. With the protected build, on the other hand, all interactions with the kernel from the user application logic must use system calls (or syscalls) to interface with the OS. A system call is necessary to transition from user-mode to kernel-mode; all user-space operating system interfaces are via syscall proxies. Then, while in kernel mode, the kernel system call handler will perform the OS service requested by the application. At the conclusion of system processing, user-privileges are restored and control is return to the user application. Since the only interactions with the kernel can be through support system calls, modularity of the OS is guaranteed.
User-Space Proxies/Kernel-Space Stubs
The same OS interfaces are exposed to the application in both the “flat”
build and the protected build. The difference is that in the protected
build, the user-code interfaces with a proxy for the OS function. For
example, here is what a proxy for the OS getpid()
interface:
#include <unistd.h>
#include <syscall.h>
pid_t getpid(void)
{
return (pid_t)sys_call0(SYS_getpid);
}
Thus the getpid()
proxy is a stand-in for the real OS getpid()
interface
that executes a system call so the kernel code can perform the real
getpid()
operation on behalf of the user application. Proxies are
auto-generated for all exported OS interfaces using the CSV file
syscall/syscall.csv
and the program tools/mksyscalls
. Similarly,
on the kernel-side, there are auto-generated stubs that map the
system calls back into real OS calls. These, however, are internal
to the OS and the implementation may be architecture-specific.
See the README.txt
files in those directories for further information.
Combining Intel HEX Files
One issue that you may face is that the two pass builds creates two FLASH images. Some debuggers that I use will allow me to write each image to FLASH separately. Others will expect to have a single Intel HEX image. In this latter case, you may need to combine the two Intel HEX files into one. Here is how you can do that:
The tail of the
nuttx.hex
file should look something like this (with my comments and spaces added):
$ tail nuttx.hex
# 00, data records
...
:10 9DC0 00 01000000000800006400020100001F0004
:10 9DD0 00 3B005A0078009700B500D400F300110151
:08 9DE0 00 30014E016D0100008D
# 05, Start Linear Address Record
:04 0000 05 0800 0419 D2
# 01, End Of File record
:00 0000 01 FF
Use an editor such as vi to remove the 05 and 01 records.
The head of the
nuttx_user.hex
file should look something like this (again with my comments and spaces added):
$ head nuttx_user.hex
# 04, Extended Linear Address Record
:02 0000 04 0801 F1
# 00, data records
:10 8000 00 BD89 01084C800108C8110208D01102087E
:10 8010 00 0010 00201C1000201C1000203C16002026
:10 8020 00 4D80 01085D80010869800108ED83010829
...
Nothing needs to be done here. The nuttx_user.hex
file should be fine.
Combine the edited nuttx.hex and un-edited
nuttx_user.hex
file to produce a single combined hex file:
$ cat nuttx.hex nuttx_user.hex >combined.hex
Then use the combined.hex
file with for FLASH/JTAG tool. If you do this
a lot, you will probably want to invest a little time to develop a tool
to automate these steps.
Files and Directories
Here is a summary of directories and files used by the STM32F4Discovery protected build:
boards/arm/stm32/stm32f4discovery/configs/kostest
. This is the kernel mode OS test configuration. The two standard configuration files can be found in this directory: (1)defconfig
and (2)Make.defs
.boards/arm/stm32/stm32f4discovery/kernel
. This is the first past build directory. The Makefile in this directory is invoked to produce the pass1 object (nuttx_user.elf
in this case). The second pass object is created byarch/arm/src/Makefile
. Also in this directory is the fileuserspace.c
. The user-mode blob contains a header that includes information need by the kernel blob in order to interface with the user-code. That header is defined in by this file.boards/arm/stm32/stm32f4discovery/scripts
. Linker scripts for the kernel mode build are found in this directory. This includes (1)memory.ld
which hold the common memory map, (2)user-space.ld
that is used for linking the pass1 user-mode blob, and (3)kernel-space.ld
that is used for linking the pass1 kernel-mode blob.
Alignment, Regions, and Subregions
There are some important comments in the memory.ld
file that are worth duplicating here:
“The STM32F407VG has 1024Kb of FLASH beginning at address 0x0800:0000 and 192Kb of SRAM. SRAM is split up into three blocks:
“112KB of SRAM beginning at address 0x2000:0000
“16KB of SRAM beginning at address 0x2001:c000
“64KB of CCM SRAM beginning at address 0x1000:0000
“When booting from FLASH, FLASH memory is aliased to address 0x0000:0000 where the code expects to begin execution by jumping to the entry point in the 0x0800:0000 address range.
“For MPU support, the kernel-mode NuttX section is assumed to be 128Kb of FLASH and 4Kb of SRAM. That is an excessive amount for the kernel which should fit into 64KB and, of course, can be optimized as needed… Allowing the additional memory does permit addition debug instrumentation to be added to the kernel space without overflowing the partition.
“Alignment of the user space FLASH partition is also a critical factor: The user space FLASH partition will be spanned with a single region of size 2||n bytes. The alignment of the user-space region must be the same. As a consequence, as the user-space increases in size, the alignment requirement also increases.
“This alignment requirement means that the largest user space
FLASH region you can have will be 512KB at it would have to be
positioned at 0x08800000. If you change this address, don’t
forget to change the CONFIG_NUTTX_USERSPACE
configuration
setting to match and to modify the check in kernel/userspace.c
.
“For the same reasons, the maximum size of the SRAM mapping is limited to 4KB. Both of these alignment limitations could be reduced by using multiple MPU regions to map the FLASH/SDRAM range or perhaps with some clever use of subregions.”
Memory Management
At present, there are two options for memory management in the NuttX protected build:
Single User Heap
By default, there is only a single user-space heap and heap allocator that is shared by both kernel- and user-modes. PROs: Simple and makes good use of the heap memory space, CONs: Awkward architecture and no security for kernel-mode allocations.
Dual, Partitioned Heaps
Two configuration options can change this behavior:
CONFIG_MM_MULTIHEAP=y
. This changes internal memory manager interfaces so that multiple heaps can be supported.CONFIG_MM_KERNEL_HEAP=y
. Uses the multi-heap capability to enable a kernel heap
If this both options are defined defined, the two heap partitions and two copies of the memory allocators are built:
One un-protected heap partition that will allocate user accessible memory that is shared by both the kernel- and user-space code. That allocator physically resides in the user address space so that it can be called directly by both the user- and kernel-space code. There is a header at the beginning of the user-space blob; the kernel-space code gets address of the user-space allocator from this header.
And another protected heap partition that will allocate protected memory that is only accessible from the kernel code. This allocator is built into the kernel block. This separate protected heap is required if you want to support security features.
NOTE: There are security issues with calling into the user space allocators in kernel mode. That is a security hole that could be exploit to gain control of the system! Instead, the kernel code should switch to user mode before entering the memory allocator stubs (perhaps via a trap). The memory allocator stubs should then trap to return to kernel mode (as does the signal handler now).
The Traditional Approach
A more traditional approach would use something like the interface
sbrk()
. The sbrk()
function adds memory to the heap space
allocation of the calling process. In this case, there would
still be kernel- and user-mode instances of the memory allocators.
Each would sbrk()
as necessary to extend their heap; the pages
allocated for the kernel-mode allocator would be protected but
the pages allocated for the user-mode allocator would not.
PROs: Meets all of the needs. CONs: Complex. Memory losses
due to quantization.
This approach works well with CPUs that have very capable Memory Management Units (MMUs) that can coalesce the srbk-ed chunks to a contiguous, virtual heap region. Without an MMU, the sbrk-ed memory would not be contiguous; this would limit the sizes of allocations due to the physical pages.
Many MCUs will have Memory Protection Units (MPUs) that can support the security features (only). However these lower end MPUs may not support sufficient mapping capability to support this traditional approach. The ARMv7-M MPU, for example, only supports eight protection regions to manage all FLASH and SRAM and so this approach would not be technically feasible for th ARMv7-M family (Cortex-M3/4).
Comparing the “Flat” Build Configuration with the Protected Build Configuration
Compare, for example the configuration
boards/arm/stm32/stm32f4discovery/configs/ostest
and the
configuration boards/arm/stm32/stm32f4discovery/configs/kostest
.
These two configurations are identical except that one builds a
“flat” version of OS test and the other builds a kernel version
of the OS test. See the file boards/arm/stm32/stm32f4discovery/README.txt
for more details about those configurations.
The configurations can be compared using the cmpconfig
tool:
cd tools
make -f Makefile.host cmpconfig
cd ..
tools/cmpconfig boards/arm/stm32/stm32f4discovery/configs/ostest/defconfig boards/arm/stm32/stm32f4discovery/configs/kostest/defconfig
Here is a summary of the meaning of all of the important differences in the configurations. This should be enough information for you to convert any configuration from a “flat” to a protected build:
CONFIG_BUILD_2PASS=y
. This enables the two pass build.CONFIG_BUILD_PROTECTED=y
. This option enables the “two pass” protected build.CONFIG_PASS1_BUILDIR="boards/arm/stm32/stm32f4discovery/kernel"
. This tells the build system the (relative) location of the pass1 build directory.CONFIG_PASS1_OBJECT=""
. In some “two pass” build configurations, the build system need to know the name of the first pass object. This setting is not used for the protected build.CONFIG_NUTTX_USERSPACE=0x08020000
. This is the expected location where the user-mode blob will be located. The user-mode blob contains a header that includes information need by the kernel blob in order to interface with the user-code. That header will be expected to reside at this location.CONFIG_PASS1_TARGET="all"
. This is the build target to use for invoking the pass1 make.CONFIG_MM_MULTIHEAP=y
. This changes internal memory manager interfaces so that multiple heaps can be supported.CONFIG_MM_KERNEL_HEAP=y
. NuttX supports the option of using a single user-accessible heap or, if this options is defined, two heaps: (1) one that will allocate user accessible memory that is shared by both the kernel- and user-space code, and (2) one that will allocate protected memory that is only accessible from the kernel code. Separate heap memory is required if you want to support security features.CONFIG_MM_KERNEL_HEAPSIZE=8192
. This determines an approximate size for the kernel heap. The standard heap space is partitioned into a kernel- and user-heap space. This size of the kernel heap is only approximate because the user heap is subject to stringent alignment requirements. Because of the alignment requirements, the actual size of the kernel heap could be considerable larger than this.CONFIG_BOARD_EARLY_INITIALIZE=y
. This setting enables a special, early initialization call to initialize board-specific resources.CONFIG_BOARD_LATE_INITIALIZE=y
. This setting enables a special initialization call to initialize late board-specific resources. The difference betweenCONFIG_BOARD_EARLY_INITIALIZE
andCONFIG_BOARD_LATE_INITIALIZE
is that theCONFIG_BOARD_EARLY_INITIALIZE
logic runs earlier in initialization before the full operating system is up and running.CONFIG_BOARD_LATE_INITIALIZE
, on the other hand, runs at the completion of initialization, just before the user applications are started. NeitherCONFIG_BOARD_EARLY_INITIALIZE
norCONFIG_BOARD_LATE_INITIALIZE
are used in the OS test configuration but other configurations (such as NSH) require some application-specific initialization before the application can run. In the “flat” build, such initialization is performed as part of the application start-up sequence. These includes such things as initializing device drivers. These same initialization steps must be performed in kernel mode for the protected build andCONFIG_BOARD_LATE_INITIALIZE
. Seeboards/arm/stm32/stm32f4discovery/src/up_boot.c
for an example of such board initialization code.CONFIG_NSH_ARCHINITIALIZE
is not defined. The settingCONFIG_NSH_ARCHINITIALIZE
does not apply to the OS test configuration, however, this is noted here as an example of initialization that cannot be performed in the protected build.
Architecture-Specific Options:
CONFIG_SYS_RESERVED=8
. The user application logic interfaces with the kernel blob using system calls. The architecture-specific logic may need to reserved a few system calls for its own internal use. The ARMv7-M architectures all require 8 reserved system calls.CONFIG_SYS_NNEST=2
. System calls may be nested. The system must retain information about each nested system call and this setting is used to set aside resources for nested system calls. In the current architecture, a maximum nesting level of two is all that is needed.CONFIG_ARMV7M_MPU=y
. This settings enables support for the ARMv7-M Memory Protection Unit (MPU). The MPU is used to prohibit user-mode access to kernel resources.CONFIG_ARMV7M_MPU_NREGIONS=8
. The ARMv7-M MPU supports 8 protection regions.
Size Expansion
The protected build will, or course, result in a FLASH image that is
larger than that of the corresponding “flat” build. How much larger?
I don’t have the numbers in hand, but you can build
boards/arm/stm32/stm32f4discovery/configs/nsh
and
boards/arm/stm32/stm32f4discovery/configs/kostest
and compare
the resulting binaries for yourself using the size
command.
Increases in size are expected because:
The syscall layer is included in the protected build but not the flat build.
The kernel-size _syscal_l stubs will cause all enabled OS code to be drawn into the build. In the flat build, only those OS interfaces actually called by the application will be included in the final objects.
The dual memory allocators will increase size.
Code duplication. Some code, such as the C library, will be duplicated in both the kernel- and user-blobs, and
Alignment. The alignments required by the MPU logic will leave relatively large regions of FLASH (and perhaps RAM) is not usable.
Performance Issues
The only performance differences using the protected build should result as a consequence of the sycalls used to interact with the OS vs. the direct C calls as used in the flat build. If your performance is highly dependent upon high rate OS calls, then this could be an issue for you. But, in the typical application, OS calls do not often figure into the critical performance paths.
The syscalls are, ultimately, software interrupts. If the platform does not support prioritized, nested interrupts then the syscall execution could also delay other hardware interrupt processing. However, sycall processing is negligible: they really just configure to return to in supervisor mode and vector to the syscall stub. They should be lightning fast and, for the typical real-time applications, should cause no issues.