.\" $NetBSD: uvm.9,v 1.114 2017/07/03 21:28:48 wiz Exp $
.\"
.\" Copyright (c) 1998 Matthew R. Green
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.Dd March 23, 2015
.Dt UVM 9
.Os
.Sh NAME
.Nm uvm
.Nd virtual memory system external interface
.Sh SYNOPSIS
.In sys/param.h
.In uvm/uvm.h
.Sh DESCRIPTION
The UVM virtual memory system manages access to the computer's memory
resources.
User processes and the kernel access these resources through
UVM's external interface.
UVM's external interface includes functions that:
.Pp
.Bl -hyphen -compact
.It
initialize UVM sub-systems
.It
manage virtual address spaces
.It
resolve page faults
.It
memory map files and devices
.It
perform uio-based I/O to virtual memory
.It
allocate and free kernel virtual memory
.It
allocate and free physical memory
.El
.Pp
In addition to exporting these services, UVM has two kernel-level processes:
pagedaemon and swapper.
The pagedaemon process sleeps until physical memory becomes scarce.
When that happens, pagedaemon is awoken.
It scans physical memory, paging out and freeing memory that has not
been recently used.
The swapper process swaps in runnable processes that are currently swapped
out, if there is room.
.Pp
There are also several miscellaneous functions.
.Sh INITIALIZATION
.Bl -ohang
.It Ft void
.Fn uvm_init "void" ;
.It Ft void
.Fn uvm_init_limits "struct lwp *l" ;
.It Ft void
.Fn uvm_setpagesize "void" ;
.It Ft void
.Fn uvm_swap_init "void" ;
.El
.Pp
.Fn uvm_init
sets up the UVM system at system boot time, after the
console has been setup.
It initializes global state, the page, map, kernel virtual memory state,
machine-dependent physical map, kernel memory allocator,
pager and anonymous memory sub-systems, and then enables
paging of kernel objects.
.Pp
.Fn uvm_init_limits
initializes process limits for the named process.
This is for use by the system startup for process zero, before any
other processes are created.
.Pp
.Fn uvm_md_init
does early boot initialization.
This currently includes:
.Fn uvm_setpagesize
which initializes the uvmexp members pagesize (if not already done by
machine-dependent code), pageshift and pagemask.
.Fn uvm_physseg_init
which initialises the
.Xr uvm_hotplug 9
subsystem.
It should be called by machine-dependent code early in the
.Fn pmap_init
call (see
.Xr pmap 9 ) .
.Pp
.Fn uvm_swap_init
initializes the swap sub-system.
.Sh VIRTUAL ADDRESS SPACE MANAGEMENT
See
.Xr uvm_map 9 .
.Sh PAGE FAULT HANDLING
.Bl -ohang
.It Ft int
.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ;
.El
.Pp
.Fn uvm_fault
is the main entry point for faults.
It takes
.Fa orig_map
as the map the fault originated in, a
.Fa vaddr
offset into the map the fault occurred, and
.Fa access_type
describing the type of access requested.
.Fn uvm_fault
returns a standard UVM return value.
.Sh MEMORY MAPPING FILES AND DEVICES
See
.Xr ubc 9 .
.Sh VIRTUAL MEMORY I/O
.Bl -ohang
.It Ft int
.Fn uvm_io "struct vm_map *map" "struct uio *uio" ;
.El
.Pp
.Fn uvm_io
performs the I/O described in
.Fa uio
on the memory described in
.Fa map .
.Sh ALLOCATION OF KERNEL MEMORY
See
.Xr uvm_km 9 .
.Sh ALLOCATION OF PHYSICAL MEMORY
.Bl -ohang
.It Ft struct vm_page *
.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ;
.It Ft void
.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ;
.It Ft void
.Fn uvm_pagefree "struct vm_page *pg" ;
.It Ft int
.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ;
.It Ft void
.Fn uvm_pglistfree "struct pglist *list" ;
.It Ft void
.Fn uvm_page_physload "paddr_t start" "paddr_t end" "paddr_t avail_start" "paddr_t avail_end" "int free_list" ;
.El
.Pp
.Fn uvm_pagealloc
allocates a page of memory at virtual address
.Fa off
in either the object
.Fa uobj
or the anonymous memory
.Fa anon ,
which must be locked by the caller.
Only one of
.Fa uobj
and
.Fa anon
can be non
.Dv NULL .
Returns
.Dv NULL
when no page can be found.
The flags can be any of
.Bd -literal
#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */
#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */
.Ed
.Pp
.Dv UVM_PGA_USERESERVE
means to allocate a page even if that will result in the number of free pages
being lower than
.Dv uvmexp.reserve_pagedaemon
(if the current thread is the pagedaemon) or
.Dv uvmexp.reserve_kernel
(if the current thread is not the pagedaemon).
.Dv UVM_PGA_ZERO
causes the returned page to be filled with zeroes, either by allocating it
from a pool of pre-zeroed pages or by zeroing it in-line as necessary.
.Pp
.Fn uvm_pagerealloc
reallocates page
.Fa pg
to a new object
.Fa newobj ,
at a new offset
.Fa newoff .
.Pp
.Fn uvm_pagefree
frees the physical page
.Fa pg .
If the content of the page is known to be zero-filled,
caller should set
.Dv PG_ZERO
in pg->flags so that the page allocator will use
the page to serve future
.Dv UVM_PGA_ZERO
requests efficiently.
.Pp
.Fn uvm_pglistalloc
allocates a list of pages for size
.Fa size
byte under various constraints.
.Fa low
and
.Fa high
describe the lowest and highest addresses acceptable for the list.
If
.Fa alignment
is non-zero, it describes the required alignment of the list, in
power-of-two notation.
If
.Fa boundary
is non-zero, no segment of the list may cross this power-of-two
boundary, relative to zero.
.Fa nsegs
is the maximum number of physically contiguous segments.
If
.Fa waitok
is non-zero, the function may sleep until enough memory is available.
(It also may give up in some situations, so a non-zero
.Fa waitok
does not imply that
.Fn uvm_pglistalloc
cannot return an error.)
The allocated memory is returned in the
.Fa rlist
list; the caller has to provide storage only, the list is initialized by
.Fn uvm_pglistalloc .
.Pp
.Fn uvm_pglistfree
frees the list of pages pointed to by
.Fa list .
If the content of the page is known to be zero-filled,
caller should set
.Dv PG_ZERO
in pg->flags so that the page allocator will use
the page to serve future
.Dv UVM_PGA_ZERO
requests efficiently.
.Pp
.Fn uvm_page_physload
loads physical memory segments into VM space on the specified
.Fa free_list .
It must be called at system boot time to set up physical memory
management pages.
The arguments describe the
.Fa start
and
.Fa end
of the physical addresses of the segment, and the available start and end
addresses of pages not already in use.
If a system has memory banks of
different speeds the slower memory should be given a higher
.Fa free_list
value.
.\" XXX expand on "system boot time"!
.Sh PROCESSES
.Bl -ohang
.It Ft void
.Fn uvm_pageout "void" ;
.It Ft void
.Fn uvm_scheduler "void" ;
.El
.Pp
.Fn uvm_pageout
is the main loop for the page daemon.
.Pp
.Fn uvm_scheduler
is the process zero main loop, which is to be called after the
system has finished starting other processes.
It handles the swapping in of runnable, swapped out processes in priority
order.
.Sh PAGE LOAN
.Bl -ohang
.It Ft int
.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ;
.It Ft void
.Fn uvm_unloan "void *v" "int npages" "int flags" ;
.El
.Pp
.Fn uvm_loan
loans pages in a map out to anons or to the kernel.
.Fa map
should be unlocked,
.Fa start
and
.Fa len
should be multiples of
.Dv PAGE_SIZE .
Argument
.Fa flags
should be one of
.Bd -literal
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
.Ed
.Pp
.Fa v
should be pointer to array of pointers to
.Li struct anon
or
.Li struct vm_page ,
as appropriate.
The caller has to allocate memory for the array and
ensure it's big enough to hold
.Fa len / PAGE_SIZE
pointers.
Returns 0 for success, or appropriate error number otherwise.
Note that wired pages can't be loaned out and
.Fn uvm_loan
will fail in that case.
.Pp
.Fn uvm_unloan
kills loans on pages or anons.
The
.Fa v
must point to the array of pointers initialized by previous call to
.Fn uvm_loan .
.Fa npages
should match number of pages allocated for loan, this also matches
number of items in the array.
Argument
.Fa flags
should be one of
.Bd -literal
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
.Ed
.Pp
and should match what was used for previous call to
.Fn uvm_loan .
.Sh MISCELLANEOUS FUNCTIONS
.Bl -ohang
.It Ft struct uvm_object *
.Fn uao_create "vsize_t size" "int flags" ;
.It Ft void
.Fn uao_detach "struct uvm_object *uobj" ;
.It Ft void
.Fn uao_reference "struct uvm_object *uobj" ;
.It Ft bool
.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ;
.It Ft void
.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ;
.It Ft int
.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ;
.It Ft void
.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ;
.It Ft void
.Fn uvm_meter "void" ;
.It Ft void
.Fn uvm_proc_fork "struct proc *p1" "struct proc *p2" "bool shared" ;
.It Ft int
.Fn uvm_grow "struct proc *p" "vaddr_t sp" ;
.It Ft void
.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ;
.It Ft void
.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ;
.El
.Pp
The
.Fn uao_create ,
.Fn uao_detach ,
and
.Fn uao_reference
functions operate on anonymous memory objects, such as those used to support
System V shared memory.
.Fn uao_create
returns an object of size
.Fa size
with flags:
.Bd -literal
#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */
#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */
.Ed
.Pp
which can only be used once each at system boot time.
.Fn uao_reference
creates an additional reference to the named anonymous memory object.
.Fn uao_detach
removes a reference from the named anonymous memory object, destroying
it if removing the last reference.
.Pp
.Fn uvm_chgkprot
changes the protection of kernel memory from
.Fa addr
to
.Fa addr + len
to the value of
.Fa rw .
This is primarily useful for debuggers, for setting breakpoints.
This function is only available with options
.Dv KGDB .
.Pp
.Fn uvm_kernacc
checks the access at address
.Fa addr
to
.Fa addr + len
for
.Fa rw
access in the kernel address space.
.Pp
.Fn uvm_vslock
and
.Fn uvm_vsunlock
control the wiring and unwiring of pages for process
.Fa p
from
.Fa addr
to
.Fa addr + len .
These functions are normally used to wire memory for I/O.
.Pp
.Fn uvm_meter
calculates the load average.
.Pp
.Fn uvm_proc_fork
forks a virtual address space for process' (old)
.Fa p1
and (new)
.Fa p2 .
If the
.Fa shared
argument is non zero, p1 shares its address space with p2,
otherwise a new address space is created.
This function currently has no return value, and thus cannot fail.
In the future, this function will be changed to allow it to
fail in low memory conditions.
.Pp
.Fn uvm_grow
increases the stack segment of process
.Fa p
to include
.Fa sp .
.Pp
.Fn uvn_findpages
looks up or creates pages in
.Fa uobj
at offset
.Fa offset ,
marks them busy and returns them in the
.Fa pps
array.
Currently
.Fa uobj
must be a vnode object.
The number of pages requested is pointed to by
.Fa npagesp ,
and this value is updated with the actual number of pages returned.
The flags can be any bitwise inclusive-or of:
.Pp
.Bl -tag -offset abcd -compact -width UVM_ADV_SEQUENTIAL
.It Dv UFP_ALL
Zero pseudo-flag meaning return all pages.
.It Dv UFP_NOWAIT
Don't sleep \(em yield
.Dv NULL
for busy pages or for uncached pages for which allocation would sleep.
.It Dv UFP_NOALLOC
Don't allocate \(em yield
.Dv NULL
for uncached pages.
.It Dv UFP_NOCACHE
Don't use cached pages \(em yield
.Dv NULL
instead.
.It Dv UFP_NORDONLY
Don't yield read-only pages \(em yield
.Dv NULL
for pages marked
.Dv PG_READONLY .
.It Dv UFP_DIRTYONLY
Don't yield clean pages \(em stop early at the first clean one.
As a side effect, mark yielded dirty pages clean.
Caller must write them to permanent storage before unbusying.
.It Dv UFP_BACKWARD
Traverse pages in reverse order.
If
.Fn uvn_findpages
returns early, it will have filled
.Li * Ns Fa npagesp
entries at the end of
.Fa pps
rather than the beginning.
.El
.Pp
.Fn uvm_vnp_setsize
sets the size of vnode
.Fa vp
to
.Fa newsize .
Caller must hold a reference to the vnode.
If the vnode shrinks, pages no longer used are discarded.
.Sh MISCELLANEOUS MACROS
.Bl -ohang
.It Ft paddr_t
.Fn atop "paddr_t pa" ;
.It Ft paddr_t
.Fn ptoa "paddr_t pn" ;
.It Ft paddr_t
.Fn round_page "address" ;
.It Ft paddr_t
.Fn trunc_page "address" ;
.El
.Pp
The
.Fn atop
macro converts a physical address
.Fa pa
into a page number.
The
.Fn ptoa
macro does the opposite by converting a page number
.Fa pn
into a physical address.
.Pp
.Fn round_page
and
.Fn trunc_page
macros return a page address boundary from rounding
.Fa address
up and down, respectively, to the nearest page boundary.
These macros work for either addresses or byte counts.
.Sh SYSCTL
UVM provides support for the
.Dv CTL_VM
domain of the
.Xr sysctl 3
hierarchy.
It handles the
.Dv VM_LOADAVG ,
.Dv VM_METER ,
.Dv VM_UVMEXP ,
and
.Dv VM_UVMEXP2
nodes, which return the current load averages, calculates current VM
totals, returns the uvmexp structure, and a kernel version independent
view of the uvmexp structure, respectively.
It also exports a number of tunables that control how much VM space is
allowed to be consumed by various tasks.
The load averages are typically accessed from userland using the
.Xr getloadavg 3
function.
The uvmexp structure has all global state of the UVM system,
and has the following members:
.Bd -literal
/* vm_page constants */
int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */
int pagemask; /* page mask */
int pageshift; /* page shift */
/* vm_page counters */
int npages; /* number of pages we manage */
int free; /* number of free pages */
int paging; /* number of pages in the process of being paged out */
int wired; /* number of wired pages */
int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
int reserve_kernel; /* number of pages reserved for kernel */
/* pageout params */
int freemin; /* min number of free pages */
int freetarg; /* target number of free pages */
int inactarg; /* target number of inactive pages */
int wiredmax; /* max number of wired pages */
/* swap */
int nswapdev; /* number of configured swap devices in system */
int swpages; /* number of PAGE_SIZE'ed swap pages */
int swpginuse; /* number of swap pages in use */
int nswget; /* number of times fault calls uvm_swap_get() */
int nanon; /* number total of anon's in system */
int nfreeanon; /* number of free anon's */
/* stat counters */
int faults; /* page fault count */
int traps; /* trap count */
int intrs; /* interrupt count */
int swtch; /* context switch count */
int softs; /* software interrupt count */
int syscalls; /* system calls */
int pageins; /* pagein operation count */
/* pageouts are in pdpageouts below */
int pgswapin; /* pages swapped in */
int pgswapout; /* pages swapped out */
int forks; /* forks */
int forks_ppwait; /* forks where parent waits */
int forks_sharevm; /* forks where vmspace is shared */
/* fault subcounters */
int fltnoram; /* number of times fault was out of ram */
int fltnoanon; /* number of times fault was out of anons */
int fltpgwait; /* number of times fault had to wait on a page */
int fltpgrele; /* number of times fault found a released page */
int fltrelck; /* number of times fault relock called */
int fltrelckok; /* number of times fault relock is a success */
int fltanget; /* number of times fault gets anon page */
int fltanretry; /* number of times fault retrys an anon get */
int fltamcopy; /* number of times fault clears "needs copy" */
int fltnamap; /* number of times fault maps a neighbor anon page */
int fltnomap; /* number of times fault maps a neighbor obj page */
int fltlget; /* number of times fault does a locked pgo_get */
int fltget; /* number of times fault does an unlocked get */
int flt_anon; /* number of times fault anon (case 1a) */
int flt_acow; /* number of times fault anon cow (case 1b) */
int flt_obj; /* number of times fault is on object page (2a) */
int flt_prcopy; /* number of times fault promotes with copy (2b) */
int flt_przero; /* number of times fault promotes with zerofill (2b) */
/* daemon counters */
int pdwoke; /* number of times daemon woke up */
int pdrevs; /* number of times daemon rev'd clock hand */
int pdfreed; /* number of pages daemon freed since boot */
int pdscans; /* number of pages daemon scanned since boot */
int pdanscan; /* number of anonymous pages scanned by daemon */
int pdobscan; /* number of object pages scanned by daemon */
int pdreact; /* number of pages daemon reactivated since boot */
int pdbusy; /* number of times daemon found a busy page */
int pdpageouts; /* number of times daemon started a pageout */
int pdpending; /* number of times daemon got a pending pageout */
int pddeact; /* number of pages daemon deactivates */
.Ed
.Sh NOTES
.Fn uvm_chgkprot
is only available if the kernel has been compiled with options
.Dv KGDB .
.Pp
All structure and types whose names begin with
.Dq vm_
will be renamed to
.Dq uvm_ .
.Sh SEE ALSO
.Xr swapctl 2 ,
.Xr getloadavg 3 ,
.Xr kvm 3 ,
.Xr sysctl 3 ,
.Xr ddb 4 ,
.Xr options 4 ,
.Xr memoryallocators 9 ,
.Xr pmap 9 ,
.Xr ubc 9 ,
.Xr uvm_km 9 ,
.Xr uvm_map 9
.Rs
.%A Charles D. Cranor
.%A Gurudatta M. Parulkar
.%T "The UVM Virtual Memory System"
.%I USENIX Association
.%B Proceedings of the USENIX Annual Technical Conference
.%P 117-130
.%D June 6-11, 1999
.%U http://www.usenix.org/event/usenix99/full_papers/cranor/cranor.pdf
.Re
.Sh HISTORY
UVM is a new VM system developed at Washington University in St. Louis
(Missouri).
UVM's roots lie partly in the Mach-based
.Bx 4.4
VM system, the
.Fx
VM system, and the SunOS 4 VM system.
UVM's basic structure is based on the
.Bx 4.4
VM system.
UVM's new anonymous memory system is based on the
anonymous memory system found in the SunOS 4 VM (as described in papers
published by Sun Microsystems, Inc.).
UVM also includes a number of features new to
.Bx
including page loanout, map entry passing, simplified
copy-on-write, and clustered anonymous memory pageout.
UVM is also further documented in an August 1998 dissertation by
Charles D. Cranor.
.Pp
UVM appeared in
.Nx 1.4 .
.Sh AUTHORS
.An -nosplit
.An Charles D. Cranor
.Aq Mt chuck@ccrc.wustl.edu
designed and implemented UVM.
.Pp
.An Matthew Green
.Aq Mt mrg@eterna.com.au
wrote the swap-space management code and handled the logistical issues
involved with merging UVM into the
.Nx
source tree.
.Pp
.An Chuck Silvers
.Aq Mt chuq@chuq.com
implemented the aobj pager, thus allowing UVM to support System V shared
memory and process swapping.