NuttX Protected Build

The Traditional “Flat” Build

The traditional NuttX build is a “flat” build. By flat, I mean that when you build NuttX, you end up with a single “blob” called nuttx. All of the components of the build reside in the same address space. All components of the build can access all other components of the build.

The “Two Pass” Protected Build

The NuttX protected build, on the other hand, is a “two-pass” build and generates two “blobs”: (1) a separately compiled and linked kernel blob called, again, nuttx and separately compiled and linked user blob called in nuttx_user.elf (in the existing build configurations). The user blob is created on pass 1 and the kernel blob is created on pass2.

These two make commands are identical:

make
make pass1 pass2

But the second is clearer and I prefer to use it for the protected build. In the second case, the user and kernel blobs are built separately; in the first, the kernel and user blob builds may be intermixed and somewhat confusing. You can also build the kernel and user blobs separately with one of the following commands:

make pass1
make pass2

At the end of the build, there will be several files in the top-level NuttX build directory. From Pass 1:

  • nuttx_user.elf. The pass1 user-space ELF file

  • nuttx_user.hex. The pass1 Intel HEX format file (selected in defconfig)

  • User.map. Symbols in the user-space ELF file

From Pass 2:

  • nuttx. The pass2 kernel-space ELF file

  • nuttx.hex. The pass2 Intel HEX file (selected in defconfig)

  • System.map. Symbols in the kernel-space ELF file

The Memory Protection Unit

If the MCU supports a Memory Protection Unit (MPU), then the logic within the kernel blob all execute in kernel-mode, i.e., with all privileges. These privileged threads can access all memory, all CPU instructions, and all MCU registers. The logic executing within the user-mode blob, on the other hand, all execute in user-mode with certain restrictions as enforced by the MCU and by the MPU. The MCU may restrict access to certain registers and machine instructions; with the MPU, access to all kernel memory resources are prohibited from the user logic. This includes the kernel blob’s FLASH, .bss/.data storage, and the kernel heap memory.

Advantages of the Protected Build

The advantages of such a protected build are (1) security and (2) modularity. Since the kernel resources are protected, it will be much less likely that a misbehaving task will crash the system or that a wild pointer access will corrupt critical memory. This security also provides a safer environment in which to execute 3rd party software and prevents “snooping” into the kernel memory from the hosted applications.

Modularity is assured because there is a strict control of the exposed kernel interfaces. In the flat build, all symbols are exposed and there is no enforcement of a kernel API. With the protected build, on the other hand, all interactions with the kernel from the user application logic must use system calls (or syscalls) to interface with the OS. A system call is necessary to transition from user-mode to kernel-mode; all user-space operating system interfaces are via syscall proxies. Then, while in kernel mode, the kernel system call handler will perform the OS service requested by the application. At the conclusion of system processing, user-privileges are restored and control is return to the user application. Since the only interactions with the kernel can be through support system calls, modularity of the OS is guaranteed.

User-Space Proxies/Kernel-Space Stubs

The same OS interfaces are exposed to the application in both the “flat” build and the protected build. The difference is that in the protected build, the user-code interfaces with a proxy for the OS function. For example, here is what a proxy for the OS getpid() interface:

#include <unistd.h>
#include <syscall.h>
pid_t getpid(void)
{
    return (pid_t)sys_call0(SYS_getpid);
}

Thus the getpid() proxy is a stand-in for the real OS getpid() interface that executes a system call so the kernel code can perform the real getpid() operation on behalf of the user application. Proxies are auto-generated for all exported OS interfaces using the CSV file syscall/syscall.csv and the program tools/mksyscalls. Similarly, on the kernel-side, there are auto-generated stubs that map the system calls back into real OS calls. These, however, are internal to the OS and the implementation may be architecture-specific. See the README.txt files in those directories for further information.

Combining Intel HEX Files

One issue that you may face is that the two pass builds creates two FLASH images. Some debuggers that I use will allow me to write each image to FLASH separately. Others will expect to have a single Intel HEX image. In this latter case, you may need to combine the two Intel HEX files into one. Here is how you can do that:

  1. The tail of the nuttx.hex file should look something like this (with my comments and spaces added):

$ tail nuttx.hex
# 00, data records
...
:10 9DC0 00 01000000000800006400020100001F0004
:10 9DD0 00 3B005A0078009700B500D400F300110151
:08 9DE0 00 30014E016D0100008D
# 05, Start Linear Address Record
:04 0000 05 0800 0419 D2
# 01, End Of File record
:00 0000 01 FF

Use an editor such as vi to remove the 05 and 01 records.

  1. The head of the nuttx_user.hex file should look something like this (again with my comments and spaces added):

$ head nuttx_user.hex
# 04, Extended Linear Address Record
:02 0000 04 0801 F1
# 00, data records
:10 8000 00 BD89 01084C800108C8110208D01102087E
:10 8010 00 0010 00201C1000201C1000203C16002026
:10 8020 00 4D80 01085D80010869800108ED83010829
...

Nothing needs to be done here. The nuttx_user.hex file should be fine.

  1. Combine the edited nuttx.hex and un-edited nuttx_user.hex file to produce a single combined hex file:

$ cat nuttx.hex nuttx_user.hex >combined.hex

Then use the combined.hex file with for FLASH/JTAG tool. If you do this a lot, you will probably want to invest a little time to develop a tool to automate these steps.

Files and Directories

Here is a summary of directories and files used by the STM32F4Discovery protected build:

  • boards/arm/stm32/stm32f4discovery/configs/kostest. This is the kernel mode OS test configuration. The two standard configuration files can be found in this directory: (1) defconfig and (2) Make.defs.

  • boards/arm/stm32/stm32f4discovery/kernel. This is the first past build directory. The Makefile in this directory is invoked to produce the pass1 object (nuttx_user.elf in this case). The second pass object is created by arch/arm/src/Makefile. Also in this directory is the file userspace.c. The user-mode blob contains a header that includes information need by the kernel blob in order to interface with the user-code. That header is defined in by this file.

  • boards/arm/stm32/stm32f4discovery/scripts. Linker scripts for the kernel mode build are found in this directory. This includes (1) memory.ld which hold the common memory map, (2) user-space.ld that is used for linking the pass1 user-mode blob, and (3) kernel-space.ld that is used for linking the pass1 kernel-mode blob.

Alignment, Regions, and Subregions

There are some important comments in the memory.ld file that are worth duplicating here:

“The STM32F407VG has 1024Kb of FLASH beginning at address 0x0800:0000 and 192Kb of SRAM. SRAM is split up into three blocks:

  • “112KB of SRAM beginning at address 0x2000:0000

  • “16KB of SRAM beginning at address 0x2001:c000

  • “64KB of CCM SRAM beginning at address 0x1000:0000

“When booting from FLASH, FLASH memory is aliased to address 0x0000:0000 where the code expects to begin execution by jumping to the entry point in the 0x0800:0000 address range.

“For MPU support, the kernel-mode NuttX section is assumed to be 128Kb of FLASH and 4Kb of SRAM. That is an excessive amount for the kernel which should fit into 64KB and, of course, can be optimized as needed… Allowing the additional memory does permit addition debug instrumentation to be added to the kernel space without overflowing the partition.

“Alignment of the user space FLASH partition is also a critical factor: The user space FLASH partition will be spanned with a single region of size 2||n bytes. The alignment of the user-space region must be the same. As a consequence, as the user-space increases in size, the alignment requirement also increases.

“This alignment requirement means that the largest user space FLASH region you can have will be 512KB at it would have to be positioned at 0x08800000. If you change this address, don’t forget to change the CONFIG_NUTTX_USERSPACE configuration setting to match and to modify the check in kernel/userspace.c.

“For the same reasons, the maximum size of the SRAM mapping is limited to 4KB. Both of these alignment limitations could be reduced by using multiple MPU regions to map the FLASH/SDRAM range or perhaps with some clever use of subregions.”

Memory Management

At present, there are two options for memory management in the NuttX protected build:

Single User Heap

By default, there is only a single user-space heap and heap allocator that is shared by both kernel- and user-modes. PROs: Simple and makes good use of the heap memory space, CONs: Awkward architecture and no security for kernel-mode allocations.

Dual, Partitioned Heaps

Two configuration options can change this behavior:

  • CONFIG_MM_MULTIHEAP=y. This changes internal memory manager interfaces so that multiple heaps can be supported.

  • CONFIG_MM_KERNEL_HEAP=y. Uses the multi-heap capability to enable a kernel heap

If this both options are defined defined, the two heap partitions and two copies of the memory allocators are built:

One un-protected heap partition that will allocate user accessible memory that is shared by both the kernel- and user-space code. That allocator physically resides in the user address space so that it can be called directly by both the user- and kernel-space code. There is a header at the beginning of the user-space blob; the kernel-space code gets address of the user-space allocator from this header.

And another protected heap partition that will allocate protected memory that is only accessible from the kernel code. This allocator is built into the kernel block. This separate protected heap is required if you want to support security features.

NOTE: There are security issues with calling into the user space allocators in kernel mode. That is a security hole that could be exploit to gain control of the system! Instead, the kernel code should switch to user mode before entering the memory allocator stubs (perhaps via a trap). The memory allocator stubs should then trap to return to kernel mode (as does the signal handler now).

The Traditional Approach

A more traditional approach would use something like the interface sbrk(). The sbrk() function adds memory to the heap space allocation of the calling process. In this case, there would still be kernel- and user-mode instances of the memory allocators. Each would sbrk() as necessary to extend their heap; the pages allocated for the kernel-mode allocator would be protected but the pages allocated for the user-mode allocator would not. PROs: Meets all of the needs. CONs: Complex. Memory losses due to quantization.

This approach works well with CPUs that have very capable Memory Management Units (MMUs) that can coalesce the srbk-ed chunks to a contiguous, virtual heap region. Without an MMU, the sbrk-ed memory would not be contiguous; this would limit the sizes of allocations due to the physical pages.

Many MCUs will have Memory Protection Units (MPUs) that can support the security features (only). However these lower end MPUs may not support sufficient mapping capability to support this traditional approach. The ARMv7-M MPU, for example, only supports eight protection regions to manage all FLASH and SRAM and so this approach would not be technically feasible for th ARMv7-M family (Cortex-M3/4).

Comparing the “Flat” Build Configuration with the Protected Build Configuration

Compare, for example the configuration boards/arm/stm32/stm32f4discovery/configs/ostest and the configuration boards/arm/stm32/stm32f4discovery/configs/kostest. These two configurations are identical except that one builds a “flat” version of OS test and the other builds a kernel version of the OS test. See the file boards/arm/stm32/stm32f4discovery/README.txt for more details about those configurations.

The configurations can be compared using the cmpconfig tool:

cd tools
make -f Makefile.host cmpconfig
cd ..
tools/cmpconfig boards/arm/stm32/stm32f4discovery/configs/ostest/defconfig boards/arm/stm32/stm32f4discovery/configs/kostest/defconfig

Here is a summary of the meaning of all of the important differences in the configurations. This should be enough information for you to convert any configuration from a “flat” to a protected build:

  • CONFIG_BUILD_2PASS=y. This enables the two pass build.

  • CONFIG_BUILD_PROTECTED=y. This option enables the “two pass” protected build.

  • CONFIG_PASS1_BUILDIR="boards/arm/stm32/stm32f4discovery/kernel". This tells the build system the (relative) location of the pass1 build directory.

  • CONFIG_PASS1_OBJECT="". In some “two pass” build configurations, the build system need to know the name of the first pass object. This setting is not used for the protected build.

  • CONFIG_NUTTX_USERSPACE=0x08020000. This is the expected location where the user-mode blob will be located. The user-mode blob contains a header that includes information need by the kernel blob in order to interface with the user-code. That header will be expected to reside at this location.

  • CONFIG_PASS1_TARGET="all". This is the build target to use for invoking the pass1 make.

  • CONFIG_MM_MULTIHEAP=y. This changes internal memory manager interfaces so that multiple heaps can be supported.

  • CONFIG_MM_KERNEL_HEAP=y. NuttX supports the option of using a single user-accessible heap or, if this options is defined, two heaps: (1) one that will allocate user accessible memory that is shared by both the kernel- and user-space code, and (2) one that will allocate protected memory that is only accessible from the kernel code. Separate heap memory is required if you want to support security features.

  • CONFIG_MM_KERNEL_HEAPSIZE=8192. This determines an approximate size for the kernel heap. The standard heap space is partitioned into a kernel- and user-heap space. This size of the kernel heap is only approximate because the user heap is subject to stringent alignment requirements. Because of the alignment requirements, the actual size of the kernel heap could be considerable larger than this.

  • CONFIG_BOARD_EARLY_INITIALIZE=y. This setting enables a special, early initialization call to initialize board-specific resources.

  • CONFIG_BOARD_LATE_INITIALIZE=y. This setting enables a special initialization call to initialize late board-specific resources. The difference between CONFIG_BOARD_EARLY_INITIALIZE and CONFIG_BOARD_LATE_INITIALIZE is that the CONFIG_BOARD_EARLY_INITIALIZE logic runs earlier in initialization before the full operating system is up and running. CONFIG_BOARD_LATE_INITIALIZE, on the other hand, runs at the completion of initialization, just before the user applications are started. Neither CONFIG_BOARD_EARLY_INITIALIZE nor CONFIG_BOARD_LATE_INITIALIZE are used in the OS test configuration but other configurations (such as NSH) require some application-specific initialization before the application can run. In the “flat” build, such initialization is performed as part of the application start-up sequence. These includes such things as initializing device drivers. These same initialization steps must be performed in kernel mode for the protected build and CONFIG_BOARD_LATE_INITIALIZE. See boards/arm/stm32/stm32f4discovery/src/up_boot.c for an example of such board initialization code.

  • CONFIG_NSH_ARCHINITIALIZE is not defined. The setting CONFIG_NSH_ARCHINITIALIZE does not apply to the OS test configuration, however, this is noted here as an example of initialization that cannot be performed in the protected build.

Architecture-Specific Options:

  • CONFIG_SYS_RESERVED=8. The user application logic interfaces with the kernel blob using system calls. The architecture-specific logic may need to reserved a few system calls for its own internal use. The ARMv7-M architectures all require 8 reserved system calls.

  • CONFIG_SYS_NNEST=2. System calls may be nested. The system must retain information about each nested system call and this setting is used to set aside resources for nested system calls. In the current architecture, a maximum nesting level of two is all that is needed.

  • CONFIG_ARMV7M_MPU=y. This settings enables support for the ARMv7-M Memory Protection Unit (MPU). The MPU is used to prohibit user-mode access to kernel resources.

  • CONFIG_ARMV7M_MPU_NREGIONS=8. The ARMv7-M MPU supports 8 protection regions.

Size Expansion

The protected build will, or course, result in a FLASH image that is larger than that of the corresponding “flat” build. How much larger? I don’t have the numbers in hand, but you can build boards/arm/stm32/stm32f4discovery/configs/nsh and boards/arm/stm32/stm32f4discovery/configs/kostest and compare the resulting binaries for yourself using the size command.

Increases in size are expected because:

  • The syscall layer is included in the protected build but not the flat build.

  • The kernel-size _syscal_l stubs will cause all enabled OS code to be drawn into the build. In the flat build, only those OS interfaces actually called by the application will be included in the final objects.

  • The dual memory allocators will increase size.

  • Code duplication. Some code, such as the C library, will be duplicated in both the kernel- and user-blobs, and

  • Alignment. The alignments required by the MPU logic will leave relatively large regions of FLASH (and perhaps RAM) is not usable.

Performance Issues

The only performance differences using the protected build should result as a consequence of the sycalls used to interact with the OS vs. the direct C calls as used in the flat build. If your performance is highly dependent upon high rate OS calls, then this could be an issue for you. But, in the typical application, OS calls do not often figure into the critical performance paths.

The syscalls are, ultimately, software interrupts. If the platform does not support prioritized, nested interrupts then the syscall execution could also delay other hardware interrupt processing. However, sycall processing is negligible: they really just configure to return to in supervisor mode and vector to the syscall stub. They should be lightning fast and, for the typical real-time applications, should cause no issues.