.\" $NetBSD: 1.me,v 1.1 1998/07/15 00:34:54 thorpej Exp $
.\"
.\" Copyright (c) 1998 Jason R. Thorpe.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\" must display the following acknowledgements:
.\" This product includes software developed for the NetBSD Project
.\" by Jason R. Thorpe.
.\" 4. The name of the author may not be used to endorse or promote products
.\" derived from this software without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.sh 1 "Introduction"
.pp
NetBSD is a portable, modern UNIX-like operating system which currently
runs on eighteen platforms covering nine processor architectures. Some
of these platforms, including the Alpha and i386\**, share the PCI bus
.(f
\**The term "i386" is used here to refer to all of the 386-class and higher
processors, including the i486, Pentium, Pentium Pro, and Pentium II.
.)f
as a common architectural feature.
In order to share device drivers for PCI devices between different
platforms, abstractions that hide the details of bus access must be
invented. The details that must be hidden can be broken down into
two classes: CPU access to devices on the bus (\fIbus_space\fR)
and device access to host memory (\fIbus_dma\fR). Here we will discuss
the latter; \fIbus_space\fR is a complicated topic in and of itself, and
is beyond the scope of this paper.
.pp
Within the scope of DMA, there are two broad classes of details
that must be hidden from the core device driver.
The first class, host details, deals with issues such as
the physical mapping of system memory (and the DMA mechanisms employed
as a result of such mapping) and cache semantics. The second
class, bus details, deals with issues related to features or
limitations specific to the bus to which a device is attached, such
as DMA bursting and address line limitations.
.sh 2 "Host platform details"
.pp
In the example platforms listed above, there are at least three different
mechanisms used to perform DMA. The first is used by the i386 platform.
This mechanism can be described as "what you see is what you get":
the address that the device uses to perform the DMA transfer is the same
address that the host CPU uses to access the memory location in question.
.so figure1.pic
.pp
The second mechanism,
employed by the Alpha, is very similar to the first; the address
the host CPU uses to access the memory location in question is offset from
some base address at which host memory is direct-mapped on the device bus
for the purpose of DMA.
.so figure2.pic
.pp
The third mechanism, scatter-gather-mapped DMA, employs an MMU which performs
translation of DMA addresses to host memory physical addresses. This
mechanism is also used by the Alpha, because Alpha platforms implement a
physical address space sometimes significantly larger than the 32-bit
address space supported by most currently-available PCI devices.
.so figure3.pic
.pp
The second and third DMA mechanisms above are combined on the Alpha through
the use of \fIDMA windows\fR. The ASIC which implements the PCI bus
on a particular platform has at least two of these DMA windows. Each
window may be configured for direct-mapped or scatter-gather-mapped
DMA. Windows are chosen based on the type of DMA transfer being performed,
the bus type, and the physical address range of the host memory being
accessed.
.pp
These concepts apply to platforms other than those listed above
and busses other than PCI. Similar issues exist with the TurboChannel bus
used on DECstations and early Alpha systems, and with the Q-bus used on
some DEC MIPS and VAX-based servers.
.pp
The semantics of the host system's cache are also important to devices
which wish to perform DMA. Some systems are capable of cache-coherent
DMA. On such systems, the cache is often write-through (i.e. stores are
written both to the cache and to host memory), or the cache has special
snooping logic that can detect access to a memory location for which there
is a dirty cache line (which causes the cache to be flushed automatically).
Other systems are not capable of cache-coherent DMA. On these systems,
software must explicitly flush any data caches before memory-to-device
DMA transfers, as well as invalidate soon-to-be-stale cache lines before
device-to-memory DMA.
.sh 2 "Bus details"
.pp
In addition to hiding the platform-specific DMA details for a single bus,
it is desirable to share as much device driver code as possible for
a device which may attach to multiple busses. A good example is the
BusLogic family of SCSI adapters. This family of devices comes in ISA,
EISA, VESA local bus, and PCI flavors. While there are some bus-specific
details, such as probing and interrupt initialization, the vast majority
of the code that drives this family of devices is identical for each flavor.
.pp
The BusLogic family of SCSI adapters are examples of what are termed
\fIbus masters\fR. That is to say, the device itself performs all bus
handshaking and host memory access during a DMA transfer. No third party
is involved in the transfer. Such devices, when performing a DMA transfer,
present the DMA address on the bus address lines, execute the bus's fetch
or store operation, increment the address, and so forth until the transfer
is complete. Because the device is using the bus address lines, the range
of host physical addresses the device can access is limited by the number
of such lines. On the PCI bus, which has at least 32 address lines, the
device may be able to access the entire physical address space of a 32-bit
architecture, such as the i386. ISA, however, only has 24 address lines.
This means that the device can directly access only 16MB of physical
address space.
.pp
A common solution to the limited-address-lines problem is a technique
known as \fIDMA bouncing\fR. This technique involves a second memory
area, located in the physical address range accessible by the device,
known as a \fIbounce buffer\fR. In a memory-to-device transfer, the
data is copied by the CPU to the bounce buffer, and the DMA operation is
started. Conversely, in a device-to-memory transfer, the DMA operation is
started, and the CPU then copies the data from the bounce buffer once the
DMA operation has completed.
.pp
While simple to implement, DMA bouncing is not the most elegant way to
solve the limited-address-line problem. On the Alpha, for example,
scatter-gather-mapped DMA may be used to translate the out-of-range
memory physical addresses to in-range DMA addresses that the device
may use. This solution tends to offer better performance due to
eliminated data copies, and is less expensive in terms of memory usage.
.pp
Returning to the BusLogic SCSI example, it is undesirable to place
intimate knowledge of direct-mapping, scatter-gather-mapping,
and DMA bouncing in the core device driver. Clearly, an abstraction that
hides these details and presents a consistent interface, regardless of
the DMA mechanism being used, is needed.