Porting to the BCM2711 (Raspberry Pi 4B)
This port was completed for the 12.7.0 version of the NuttX kernel, and was contributed by Matteo Golin.
The pull request with this initial support can be found at apache/nuttx/pull/15188.
The port required support to be written for a new chip (the BCM2711) and a new board. Matteo created journal entries while working on the initial port, which can be found on his blog. The details below are a more concise summary of the porting process.
Researching
The first step to porting a board to NuttX was researching the board and how NuttX works.
The BCM2711 is a quad-core ARM Cortex A72 based SoC, and it supports both aarch64 and 32 bit ARM architectures. I focused on the aarch64 implementation only in this port. My first step was determining other boards already in the NuttX kernel that used the aarch64 architecture, because that gives me a starting point to porting this new chip and board.
I primarily used the blog posts written by Lup Yuen Lee about porting NuttX to the PinePhone, another ARM Cortex-A based device. The articles are listed here. Lup’s articles provided me with an understanding of the NuttX boot process, as well as which files from the aarch64 support on NuttX were pulled into the build process for booting. He also showed how he created an initial UART driver using the NuttX structure for UART drivers, which allowed him to get NSH appearing in the console.
Finally, I also of course needed the BCM2711 datasheet in order to figure out which registers were available to me for creating peripheral drivers. The BCM2711 datasheet isn’t exceptionally detailed on many of the features on the SoC, but it did provide enough detail to set up interrupts and get UART working.
Adding to the source tree
In order to build my code with the NuttX build system, I would have to add the board and the BCM2711 chip to the source
tree for NuttX. This way, it would appear as an available configuration via the tools/configure.sh
script and I
could select options for it with make menuconfig
.
The first thing to do was to add the chip, which goes under the arch/arm64
directory because it is an ARM 64 bit
SoC. The chip directory must be added in two places: arch/arm64/include/bcm2711
and arch/arm64/src/bcm2711
. C
files go in the src
directory with some header files, and some specific header files go in the include
directory.
In addition, in order to make the BCM2711 visible as a supported chip, I had to add it as an option in
arch/arm64/Kconfig
. In order to do this, I just copy-pasted the entry for the Allwinner A64, since the two chips
were very similar. I had to change a few fields (for instance, selecting ARCH_CORTEX_A72
instead of
ARCH_CORTEX_A53
), but this was relatively simple to complete with the information about the SoC. I also needed to
specify ARMV8A_HAVE_GICv2
, since that is the interrupt controller used by the BCM2711. ARCH_HAVE_MULTICPU
because it is a quad-core, and ARCH_USE_MMU
because it has a memory management unit.
I also needed to now add the Raspberry Pi 4B board to the source tree. To do this, I copied the board folder for the
PinePhone (boards/arm64/a64/pinephone
) and renamed it raspberrypi-4b
. I also deleted many of the files in this
folder since they weren’t applicable to the Pi 4B, and substituted all mentions of the PinePhone with the Raspberry Pi
4B (in path names and header include guards).
I then added the Pi 4B to the list of supported boards in boards/Kconfig
. For this, I just needed to create an entry
with the name ARCH_BOARD_RASPBERRYPI_4B
and write that it depends on the ARCH_CHIP_BCM2711
. No additional
options necessary! In two other places in this file I also had to add some directives to make sure the Kconfig for the
board was found properly. These set ARCH_BOARD
to the name of the board directory “raspberrypi-4b” when the Pi 4B was
selected, and source
’d the Kconfig under boards/arm64/bcm2711/raspberrypi-4b
when selected.
The default configuration for this board was copied from the PinePhone’s NSH configuration, which I modified to use the correct board name, chip, and hardware specific settings. It was still incomplete because there was no code to actually boot into NSH, but it was a starting point.
This was basically all I needed for the board to show up as a possible configuration in the source tree!
Mapping out the chip
To start writing code for the BCM2711, I needed to map out the chip. This included the register addresses and the memory
mapping, which could all be found in the BCM2711 datasheet. From looking at other implementations, the register
addresses are usually defined as C macros and kept in header files under arch/<architecture>/src/<chip>/hardware
.
This is where I put them as well, defining all the register mappings the different groups within individual files (i.e.
bmc2711_i2c.h
, bcm2711_spi.h
, etc.).
Many peripherals had groupings of memory-mapped registers, defined using a base address and then offsets from that address to access the different fields. For instance, the two mini-SPI peripherals had the same structure, each with 12 registers. The way I commonly saw these macros implemented was something like:
#define BCM_AUX_SPI1_BASEADDR (BCM_AUX_BASEADDR + BCM_AUX_SPI1_OFFSET)
#define BCM_AUX_SPI_CNTL0_REG_OFFSET (0x00) /* SPI control register 0 */
/* ... more register offsets */
/* This allows you to choose which SPI interface base address to get the register for. */
#define BCM_AUX_SPI_CNTL0(base) ((base) + BCM_AUX_SPI_CNTL0_REG_OFFSET)
In addition to the registers themselves, I also included macros to mask certain fields within the registers or set certain values. This makes the code less error prone later, because any mistakes made while copying the long list of fields and registers from the datasheet can be changed in one place.
#define BCM_SPI_CNTL0_EN (1 << 11) /* Enable SPI interface */
In addition to the registers, I also had to map the interrupts. This was done in include/bcm2711/irq.h
. I copied the
IRQ numbers from the datasheet and listed them all as macros with names. I also had to define the number of IRQS, which
was 216 in this case. The MPID_TO_CORE(mpid)
macro was copied from another arm64 implementation.
#define NR_IRQS 216
#define MPID_TO_CORE(mpid) (((mpid) >> MPIDR_AFF0_SHIFT) & MPIDR_AFFLVL_MASK)
/* VideoCore interrupts */
#define BCM_IRQ_VC_BASE 96
#define BCM_IRQ_VC(n) (BCM_IRQ_VC_BASE + n)
#define BCM_IRQ_VC_TIMER0 BCM_IRQ_VC(0)
#define BCM_IRQ_VC_TIMER1 BCM_IRQ_VC(1)
/* More interrupts ... */
Finally was to define the memory mapping within the include/bcm2711/chip.h
file. I did so simply since I was only
testing on the 4GB version of the BCM2711. The RAM starts at address 0, and is roughly 4GB in size. 64 MB of that is
reserved for the memory-mapped I/O, so I had to be sure to remove that. I also defined the load address of the kernel in
memory for the chip.
#define CONFIG_RAMBANK1_ADDR (0x000000000)
/* Both the 4GB and 8GB ram variants use all the size in RAMBANK1 */
#if defined(CONFIG_RPI4B_RAM_4GB) || defined(CONFIG_RPI4B_RAM_8GB)
#define CONFIG_RAMBANK1_SIZE GB(4) - MB(64)
#endif /* defined(CONFIG_RPI4B_RAM_4GB) || defined(CONFIG_RPI4B_RAM_8GB) */
/* Raspberry Pi 4B loads NuttX at this address */
#define CONFIG_LOAD_BASE 0x480000
The same load address had to be specified in the linker script for the Raspberry Pi 4B kernel. This scripts tells the
compiler how to lay out the kernel code in memory and what addresses to use. I was able to copy it from the PinePhone
and just change the load address to 0x480000
.
Figuring out the boot
The first thing I wanted to do was determine how much work had already been done for aarch64 that would allow me to more easily complete the port. In Lup’s blogs, he tested out support for his core type (ARM Cortex-A53 on the PinePhone) by booting the aarch64 instance of QEMU with NuttX using that core. I decided to take the same approach, and was able to successfully boot on ARM Cortex-A72 using QEMU following his blog. This was a nice confirmation that the hardware I was using was already supported in NuttX for booting the OS and getting NSH working with a PL011 UART interface.
I cannot stress enough that the reason porting to this chip was made so much easier was because I am standing on the shoulders of giants. NuttX contributors had already set up the boot scripts written in assembly, timer configuration, interrupt handling and drivers for a lot of the standard features in aarch64 architectures. I did not have to deal with any of this because of them, and it really cut down on the amount of assembly I had to read and understand. I also barely had to write any assembly outside of debugging the boot process a little (we’ll get to that later). Not to mention I had Lup’s well-written articles to guide me.
In order to compile and boot the board, I had to add a definition for g_mmu_config
, which I was confused about and
left empty initially just to get past the compilation stage. I also defined the GICR_OFFSET
and GICR_BASE
macros
for the GICv2 interrupt controller by copying them from the Allwinner chip, which used the same controller. After
reading further in Lup’s blog, I learned that the boot script has a PRINT
macro which is called early in the boot
process, and requires an implementation of up_lowputc
to print to the console. This would be the first thing I need
to implement. This compiled, but when I booted the Pi, nothing happened.
After quite a while of trying different things and looking at other implementations, I noticed that many people were
using register manipulation directly in the early print functions. I decided I would do the same, but instead of
printing (a more complex operation), I would turn one of the GPIO pins high. I was able to measure this with my
multimeter and confirm that the GPIO did get set, so I knew that the arm64_earlyprint_init
function was getting
called. Something was wrong with my UART configuration.
I then tried directly manipulating registers to put the text “hi” in the UART FIFO. When I booted again, this printed,
but then was followed by some garbled output. It appeared that the the char *
pointer passed to the print function
was getting garbled. After troubleshooting by printing characters directly by calling my arm64_lowputc
in the
assembly boot script, I discovered that I could print a string from the C definition if I declared the string as static.
I also investigated the elf generated by building and confirmed the string was located in .rodata
. I was suspicious
that I was loading the kernel incorrectly into memory and some addresses were getting mixed up. Sure enough, I had
defined the load address in the linker script as 0x80000
instead of 0x480000
. Fixing this allowed me to see the
boot messages properly!
I received this message in the console:
----gic_validate_dist_version: No GIC version detect
arm64_gic_initialize: no distributor detected, giving up ret=-19
_assert: Current Version: NuttX 12.6.0-RC0 6791d4a1c4-dirty Aug 4 2024 00:38:21 arm64
_assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x481418
I had accidentally kept the GICv3 in my config files when copying things from other boards, and changed it to GICv2. That resolved the issue and presented me with a new one:
MESS:00:00:06.144520:0:----_assert: Current Version: NuttX 12.6.0-RC0 f81fb7a076-dirty Aug 4 2024 16:16:30 arm64
_assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x4811e4
After enabling all of the debug output in the build options, this became:
arm64_oneshot_initialize: cycle_per_tick 54000
arm64_fatal_error: reason = 0
arm64_fatal_error: CurrentEL: MODE_EL1
arm64_fatal_error: ESR_ELn: 0xbf000002
arm64_fatal_error: FAR_ELn: 0x0
arm64_fatal_error: ELR_ELn: 0x48a458
print_ec_cause: SError interrupt
This looked like an unhandled interrupt, and after narrowing down which line was failing by adding log statements to the
kernel code, I discovered it was due to the spinlock code. An exception was being caused by the ldaxr
instruction,
which the ARM documentation said could only be used once the MMU was enabled. I then enabled the MMU as well as its
debug information and was greeted with the lovely error:
MESS:00:00:06.174977:0:----arm64_mmu_init: xlat tables:
arm64_mmu_init: base table(L1): 0x4cb000, 64 entries
arm64_mmu_init: 0: 0x4c4000
arm64_mmu_init: 1: 0x4c5000
arm64_mmu_init: 2: 0x4c6000
arm64_mmu_init: 3: 0x4c7000
arm64_mmu_init: 4: 0x4c8000
arm64_mmu_init: 5: 0x4c9000
arm64_mmu_init: 6: 0x4ca000
init_xlat_tables: mmap: virt 4227858432x phys 4227858432x size 67108864x
set_pte_table_desc:
set_pte_table_desc: 0x4cb018: [Table] 0x4c4000
init_xlat_tables: mmap: virt 0x phys 0x size 1006632960x
set_pte_table_desc:
set_pte_table_desc: 0x4cb000: [Table] 0x4c5000
init_xlat_tables: mmap: virt 4718592x phys 4718592x size 192512x
split_pte_block_desc: Splitting existing PTE 0x4c5010(L2)
set_pte_table_desc:
set_pte_table_desc: 0x4c5010: [Table] 0x4c6000
init_xlat_tables: mmap: virt 4911104x phys 4911104x size 81920x
init_xlat_tables: mmap: virt 4993024x phys 4993024x size 65536x
enable_mmu_el1: MMU enabled with dcache
nx_start: Entry
up_allocate_heap: heap_start=0x0x4d3000, heap_size=0x47b2d000
mm_initialize: Heap: name=Umem, start=0x4d3000 size=1202900992
mm_addregion: [Umem] Region 1: base=0x4d32a8 size=1202900304
arm64_fatal_error: reason = 0
arm64_fatal_error: CurrentEL: MODE_EL1
arm64_fatal_error: ESR_ELn: 0x96000045
arm64_fatal_error: FAR_ELn: 0x47fffff8
arm64_fatal_error: ELR_ELn: 0x489d28
print_ec_cause: Data Abort taken without a change in Exception level
_assert: Current Version: NuttX 12.6.0-RC0 96be557b64-dirty Aug 5 2024 14:56:42 arm64
_assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x481a34
up_dump_register: stack = 0x4d2e10
up_dump_register: x0: 0x13 x1: 0x4d32c0
up_dump_register: x2: 0xfe215040 x3: 0xfe215040
up_dump_register: x4: 0x0 x5: 0x0
up_dump_register: x6: 0x1 x7: 0xdba53f65cc808a8
up_dump_register: x8: 0xc4276feb17c016ba x9: 0xecbcfeb328124450
up_dump_register: x10: 0xb7989dd7d34a1280 x11: 0x5ebf5f572386fdee
up_dump_register: x12: 0x6f7c07d067f6e38 x13: 0x3f7b5adaf798b4d5
up_dump_register: x14: 0xf3dffbe2e4cff736 x15: 0xd76b1c050c964ea0
up_dump_register: x16: 0x6d6fa9cfeeb0eff8 x17: 0x1a051d808a830286
up_dump_register: x18: 0x3f7b5adaf798b4bf x19: 0x4d3000
up_dump_register: x20: 0x47fffff0 x21: 0x4d32d0
up_dump_register: x22: 0x47b2cd30 x23: 0x4d32a8
up_dump_register: x24: 0x4d32b0 x25: 0x4806f4
up_dump_register: x26: 0x2f56f66b2df71556 x27: 0x74ee6bbfb5d438f4
up_dump_register: x28: 0x7ef57ab47b85f74f x29: 0x9a7fa1cb06923003
up_dump_register: x30: 0x489cf8
up_dump_register:
up_dump_register: STATUS Registers:
up_dump_register: SPSR: 0x600002c5
up_dump_register: ELR: 0x489d28
up_dump_register: SP_EL0: 0x4d3000
up_dump_register: SP_ELX: 0x4d2f40
up_dump_register: TPIDR_EL0: 0x0
up_dump_register: TPIDR_EL1: 0x0
up_dump_register: EXE_DEPTH: 0x1
Some more debugging allowed me to determine that the CONFIG_RAM_START
and CONFIG_RAM_SIZE
macros in the
defconfig for my nsh configuration were still set to the values from the PinePhone that I copied from. I set these to
the correct values for the Raspberry Pi 4B and got much further!
MESS:00:00:06.211786:0:----irq_attach: In irq_attach
irq_attach: before spin_lock_irqsave
spin_lock_irqsave: me: 0
spin_lock_irqsave: before spin_lock
spin_lock: about to enter loop
spin_lock: loop over
spin_lock_irqsave: after spin_lock
irq_attach: after spin_lock_irqsave
irq_attach: before spin_unlock_irqrestore
irq_attach: after spin_unlock_irqrestore
arm64_serialinit: arm64_serialinit not implemented
group_setupidlefiles: ERROR: Failed to open stdin: -38
_assert: Current Version: NuttX 12.6.0-RC0 be262c7ad3-dirty Aug 5 2024 17:16:27 arm64
_assert: Assertion failed : at file: init/nx_start.c:728 task: Idle_Task process: Kernel 0x48162c
up_dump_register: stack = 0x4c0170
up_dump_register: x0: 0x4c0170 x1: 0x0
up_dump_register: x2: 0x0 x3: 0x0
up_dump_register: x4: 0x0 x5: 0x0
up_dump_register: x6: 0x3 x7: 0x0
up_dump_register: x8: 0x4c7468 x9: 0x0
up_dump_register: x10: 0x4c7000 x11: 0x4
up_dump_register: x12: 0x4b8000 x13: 0x4b7000
up_dump_register: x14: 0x1 x15: 0xfffffff7
up_dump_register: x16: 0x48a654 x17: 0x0
up_dump_register: x18: 0x1 x19: 0x0
up_dump_register: x20: 0x4ac181 x21: 0x4bf430
up_dump_register: x22: 0x0 x23: 0x4c0170
up_dump_register: x24: 0x4c0170 x25: 0x2d8
up_dump_register: x26: 0x240 x27: 0x4b7000
up_dump_register: x28: 0xfdc3ed41d6862df6 x29: 0xbf8e8f7280a0100
up_dump_register: x30: 0x481bf8
up_dump_register:
up_dump_register: STATUS Registers:
up_dump_register: SPSR: 0x20000245
up_dump_register: ELR: 0x480230
up_dump_register: SP_EL0: 0x4c7000
up_dump_register: SP_ELX: 0x4c6e90
up_dump_register: TPIDR_EL0: 0x4bf430
up_dump_register: TPIDR_EL1: 0x4bf430
up_dump_register: EXE_DEPTH: 0x0
dump_tasks: PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACKBASE STACKSIZE USED FILLED COMMAND
dump_tasks: ---- --- --- -------- ------- --- ------- ---------- ---------------- 0x4c4000 4096 144 3.5% irq
dump_task: 0 0 0 FIFO Kthread - Running 0000000000000000 0x4c5010 8176 1200 14.6% Idle_Task
CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.9 | VT102 | Offline | ttyUSB0
We actually got into tasks now! It appears stdin failed to open because in my Mini-UART driver implementation I had the
attach
and ioctl
functions return -ENOSYS
. Just changing this to 0 for success in the interim allowed us to
get even further, and I could see the beginnings of NSH spawning.
mm_initialize: Heap: name=Umem, start=0x4cc000 size=4222828544
mm_addregion: [Umem] Region 1: base=0x4cc2a8 size=4222827856
mm_malloc: Allocated 0x4cc2d0, size 144
mm_malloc: Allocated 0x4cc360, size 80
gic_validate_dist_version: GICv2 detected
up_timer_initialize: up_timer_initialize: cp15 timer(s) running at 54.0MHz
arm64_oneshot_initialize: oneshot_initialize
mm_malloc: Allocated 0x4cc3b0, size 48
arm64_oneshot_initialize: cycle_per_tick 54000
uart_register: Registering /dev/console
mm_malloc: Allocated 0x4cc3e0, size 80
mm_malloc: Allocated 0x4cc430, size 80
uart_register: Registering /dev/ttys0
mm_malloc: Allocated 0x4cc480, size 80
mm_malloc: Allocated 0x4cc4d0, size 80
mm_malloc: Allocated 0x4cc520, size 80
mm_malloc: Allocated 0x4cc570, size 32
mm_malloc: Allocated 0x4cc590, size 64
work_start_highpri: Starting high-priority kernel worker thread(s)
mm_malloc: Allocated 0x4cc5d0, size 336
mm_malloc: Allocated 0x4cc720, size 8208
nxtask_activate: hpwork pid=1,TCB=0x4cc5d0
nx_start_application: Starting init thread
task_spawn: name=nsh_main entry=0x48b24c file_actions=0 attr=0x4cbfa0 argv=0x4cbf98
mm_malloc: Allocated 0x4ce730, size 1536
mm_malloc: Allocated 0x4ced30, size 64
mm_malloc: Allocated 0x4ced70, size 32
mm_malloc: Allocated 0x4ced90, size 8208
nxtask_activate: nsh_main pid=2,TCB=0x4ce730
lib_cxx_initialize: _sinit: 0x4ad000 _einit: 0x4ad000
mm_malloc: Allocated 0x4d0da0, size 848
mm_free: Freeing 0x4d0da0
mm_free: Freeing 0x4ced70
mm_free: Freeing 0x4ced30
nxtask_exit: nsh_main pid=2,TCB=0x4ce730
mm_free: Freeing 0x4ced90
mm_free: Freeing 0x4ce730
nx_start: CPU0: Beginning Idle Loop
It seemed like we were waiting on an interrupt which never occurred. This was weird, because my Mini-UART driver had an interrupt implementation and appeared to be written just fine. This took hours of debugging, logging from interrupt handlers and dumping register values, but eventually I determined that the BCM2711 datasheet actually had an error where the TX and RX interrupt fields were swapped in the datasheet. A blog post online had mentioned this for the BCM2835, but it appeared to be an issue on this chip as well. Now we were booting into NSH!
It was at this point that the port is considered a success, since I was able to boot into NSH and successfully run the
ostest
benchmark. I went on to write the start of a few more drivers, like the GPIO driver, but this completed the
requirements for an initial port and is most of what ended up being submitted in the initial pull request.