The machine level SBI ( Supervisor Binary Interface ) is exported to the Linux kernel by mapping it at the top of the address space.
The mapping is performed by BBL in
supervisor_vm_init
defined in riscv-tools/riscv-pk/bbl/bbl.c
. // map SBI at top of vaddr space
extern char _sbi_end;
uintptr_t num_sbi_pages = ((uintptr_t)&_sbi_end - DRAM_BASE - 1) / RISCV_PGSIZE + 1;
assert(num_sbi_pages <= (1 << RISCV_PGLEVEL_BITS));
for (uintptr_t i = 0; i < num_sbi_pages; i++) {
uintptr_t idx = (1 << RISCV_PGLEVEL_BITS) - num_sbi_pages + i;
sbi_pt[idx] = pte_create((DRAM_BASE / RISCV_PGSIZE) + i, PTE_G | PTE_R | PTE_X);
}
pte_t* sbi_pte = middle_pt + ((num_middle_pts << RISCV_PGLEVEL_BITS)-1);
assert(!*sbi_pte);
*sbi_pte = ptd_create((uintptr_t)sbi_pt >> RISCV_PGSHIFT);
You can read more on
supervisor_vm_init
here https://github.com/slavaim/riscv-notes/blob/master/bbl/supervisor_vm_init.md . From the code above you can see that the last page ending at _sbi_end
physical address is mapped at the last page of the supervisor virtual address space.
The offsets to SBI entry points are defined in
riscv-tools/riscv-pk/machine/sbi.S
as.globl sbi_hart_id; sbi_hart_id = -2048
.globl sbi_num_harts; sbi_num_harts = -2032
.globl sbi_query_memory; sbi_query_memory = -2016
.globl sbi_console_putchar; sbi_console_putchar = -2000
.globl sbi_console_getchar; sbi_console_getchar = -1984
.globl sbi_send_ipi; sbi_send_ipi = -1952
.globl sbi_clear_ipi; sbi_clear_ipi = -1936
.globl sbi_timebase; sbi_timebase = -1920
.globl sbi_shutdown; sbi_shutdown = -1904
.globl sbi_set_timer; sbi_set_timer = -1888
.globl sbi_mask_interrupt; sbi_mask_interrupt = -1872
.globl sbi_unmask_interrupt; sbi_unmask_interrupt = -1856
.globl sbi_remote_sfence_vm; sbi_remote_sfence_vm = -1840
.globl sbi_remote_sfence_vm_range; sbi_remote_sfence_vm_range = -1824
.globl sbi_remote_fence_i; sbi_remote_fence_i = -1808
These definitions are offsets from the top of the address space for the SBI trampoline stubs defined in
riscv-tools/riscv-pk/machine/sbi_entry.S
# hart_id
.align 4
li a7, MCALL_HART_ID
ecall
ret
# num_harts
.align 4
lw a0, num_harts
ret
# query_memory
.align 4
tail __sbi_query_memory
The SBI trampoline stubs code start is defined as
sbi_base
and is aligned to a page boundary by align RISCV_PGSHIFT
directive. The first RISCV_PGSIZE - 2048
bytes are reserved by .skip RISCV_PGSIZE - 2048
directive so the first instruction starts at 2048 bytes offset from the page top defined as.align RISCV_PGSHIFT
.globl sbi_base
sbi_base:
# TODO: figure out something better to do with this space. It's not
# protected from the OS, so beware.
.skip RISCV_PGSIZE - 2048
The end of the section is also aligned at the page boundary and is defined as
.align RISCV_PGSHIFT
.globl _sbi_end
_sbi_end:
The SBI trampoline stubs section
.sbi
is placed at the end of BBL just before the payload by defining the layout in riscv-tools/riscv-pk/bbl/bbl.lds
as .sbi :
{
*(.sbi)
}
.payload :
{
*(.payload)
}
_end = .;
So the
supervisor_vm_init
code that maps machine level physical addresses to supervisor virtuall addresses maps the BBL .sbi
section which contains SBI trampoline stubs at the top of the supervisor virtual address space.
Linux kernel access SBI trampoline stubs by a call with offsets defined in
linux/linux-4.6.2/arch/riscv/kernel/sbi.S
which is a carbon copy of riscv-tools/riscv-pk/machine/sbi.S
. For example a snippet from Linux kernel entry point _start
defined in linux/linux-4.6.2/arch/riscv/kernel/head.S
/* See if we're the main hart */
call sbi_hart_id
bnez a0, .Lsecondary_start
This code is translated by GCC to
0xffffffff80000018 <+24>: jalr -2048(zero) # 0xfffffffffffff800
0xffffffff8000001c <+28>: bnez a0,0xffffffff80000054 <_start+84>
The address
0xfffffffffffff800
is 2048 bytes offset from the top of the virtual address space last page. As we saw above this page is backed by a physical page with SBI trampoline stubs code starting at sbi_base
machine level physical address. The dissassembling shows the SBI trampoline stubs at offsets 0xfffffffffffff800 which is sbi_hart_id = -2048
0xfffffffffffff810 which is sbi_num_harts = -2032
0xfffffffffffff820 which is sbi_query_memory = -2016
0xfffffffffffff830 which is sbi_console_putchar = -2000
etc
(gdb) x/48i 0xfffffffffffff800
0xfffffffffffff800: li a7,0
0xfffffffffffff804: ecall
0xfffffffffffff808: ret
0xfffffffffffff80c: nop
0xfffffffffffff810: auipc a0,0xfffff
0xfffffffffffff814: lw a0,-1888(a0)
0xfffffffffffff818: ret
0xfffffffffffff81c: nop
0xfffffffffffff820: j 0xffffffffffff92c0
0xfffffffffffff824: nop
0xfffffffffffff828: nop
0xfffffffffffff82c: nop
0xfffffffffffff830: li a7,1
0xfffffffffffff834: ecall
0xfffffffffffff838: ret
0xfffffffffffff83c: nop
0xfffffffffffff840: li a7,2
0xfffffffffffff844: ecall
0xfffffffffffff848: ret
0xfffffffffffff84c: nop
0xfffffffffffff850: unimp
0xfffffffffffff854: nop
0xfffffffffffff858: nop
0xfffffffffffff85c: nop
0xfffffffffffff860: li a7,4
0xfffffffffffff864: ecall
0xfffffffffffff868: ret
0xfffffffffffff86c: nop
0xfffffffffffff870: li a7,5
0xfffffffffffff874: ecall
0xfffffffffffff878: ret
0xfffffffffffff87c: nop
0xfffffffffffff880: lui a0,0x989
0xfffffffffffff884: addiw a0,a0,1664
0xfffffffffffff888: ret
0xfffffffffffff88c: nop
0xfffffffffffff890: li a7,6
0xfffffffffffff894: ecall
0xfffffffffffff898: nop
0xfffffffffffff89c: nop
0xfffffffffffff8a0: li a7,7
0xfffffffffffff8a4: ecall
0xfffffffffffff8a8: ret
0xfffffffffffff8ac: nop
0xfffffffffffff8b0: j 0xffffffffffff92f8
0xfffffffffffff8b4: nop
0xfffffffffffff8b8: nop
0xfffffffffffff8bc: nop
As you can see not all SBI trampolines stubs invoke
ecall
system call to enter a higher privilege level, the machine level in this case. For example query_memory
is just an unconditional jump to the SBI code mapped to the Linux kernel space. 0xfffffffffffff820: j 0xffffffffffff92c0
In that case the CPU doesn't switch to machine level and continues in the supervisor mode with virtual memory enabled. When CPU switches to the machine mode it disables virtual address translation and switches back to physical addresses. Below is a call stack when
query_memory
is called. A you can see the CPU continues with virtual address memory enabled and uses virtual addresses. The debugger was unable to resolve a call to query_memory
in BBL as it was not aware about the code being remapped to the Linux system address space.#0 0xffffffffffff92c8 in ?? ()
#1 0xffffffff80002c38 in setup_bootmem () at arch/riscv/kernel/setup.c:149
#2 setup_arch (cmdline_p=<optimized out>) at arch/riscv/kernel/setup.c:152
#3 0xffffffff80000898 in start_kernel () at init/main.c:500
#4 0xffffffff80000040 in _start () at arch/riscv/kernel/head.S:36
I guess that one of the possible reasons for such
query_memory
implementation is to simplify development as this function returns the structure which would require either packing it in registers or translating addresses either in the Linux kernel or in BBL SDI.
The call stack for
sbi_hart_id
looks differently#0 0x0000000080000c90 in mcall_trap (regs=0x82660ec0, mcause=9, mepc=18446744073709549572) at ../machine/mtrap.c:210
#1 0x00000000800000ec in trap_vector () at ../machine/mentry.S:116
Backtrace stopped: frame did not save the PC
(gdb) p/x mepc
$4 = 0xfffffffffffff804
The virtual address translation is disabled and the CPU works with physical addresses. The debugger was unable to cross the boundary back to the Linux kernel stack that requires processing address space translation switching. As you can see the
mpec
register points to the ecall
instruction virtual address in supervisor mode 0xfffffffffffff800: li a7,0
0xfffffffffffff804: ecall
0xfffffffffffff808: ret
No comments:
Post a Comment