Training courses

Kernel and Embedded Linux

Bootlin training courses

Embedded Linux, kernel,
Yocto Project, Buildroot, real-time,
graphics, boot time, debugging...

Bootlin logo

Elixir Cross Referencer

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
.\"	$NetBSD: disk.9,v 1.46 2017/07/03 21:28:48 wiz Exp $
.\"
.\" Copyright (c) 1995, 1996 Jason R. Thorpe.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\"    must display the following acknowledgement:
.\"	This product includes software developed for the NetBSD Project
.\"	by Jason R. Thorpe.
.\" 4. The name of the author may not be used to endorse or promote products
.\"    derived from this software without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.Dd March 5, 2017
.Dt DISK 9
.Os
.Sh NAME
.Nm disk ,
.Nm disk_init ,
.Nm disk_attach ,
.Nm disk_begindetach ,
.Nm disk_detach ,
.Nm disk_destroy ,
.Nm disk_wait ,
.Nm disk_busy ,
.Nm disk_unbusy ,
.Nm disk_isbusy ,
.Nm disk_find ,
.Nm disk_set_info
.Nd generic disk framework
.Sh SYNOPSIS
.In sys/types.h
.In sys/disklabel.h
.In sys/disk.h
.Ft void
.Fn disk_init "struct disk *" "const char *name" "const struct dkdriver *driver"
.Ft void
.Fn disk_attach "struct disk *"
.Ft void
.Fn disk_begindetach "struct disk *" "int (*lastclose)(device_t)" "device_t self" "int flags"
.Ft void
.Fn disk_detach "struct disk *"
.Ft void
.Fn disk_destroy "struct disk *"
.Ft void
.Fn disk_wait "struct disk *"
.Ft void
.Fn disk_busy "struct disk *"
.Ft void
.Fn disk_unbusy "struct disk *" "long bcount" "int read"
.Ft bool
.Fn disk_isbusy "struct disk *"
.Ft struct disk *
.Fn disk_find "const char *"
.Ft void
.Fn disk_set_info "device_t" "struct disk *" "const char *type"
.Sh DESCRIPTION
The
.Nx
generic disk framework is designed to provide flexible,
scalable, and consistent handling of disk state and metrics information.
The fundamental component of this framework is the
.Nm disk
structure, which is defined as follows:
.Bd -literal
struct disk {
	TAILQ_ENTRY(disk) dk_link;	/* link in global disklist */
	const char	*dk_name;	/* disk name */
	prop_dictionary_t dk_info;	/* reference to disk-info dictionary */
	int		dk_bopenmask;	/* block devices open */
	int		dk_copenmask;	/* character devices open */
	int		dk_openmask;	/* composite (bopen|copen) */
	int		dk_state;	/* label state   ### */
	int		dk_blkshift;	/* shift to convert DEV_BSIZE to blks */
	int		dk_byteshift;	/* shift to convert bytes to blks */

	/*
	 * Metrics data; note that some metrics may have no meaning
	 * on certain types of disks.
	 */
	struct io_stats	*dk_stats;

	const struct dkdriver *dk_driver;	/* pointer to driver */

	/*
	 * Information required to be the parent of a disk wedge.
	 */
	kmutex_t	dk_rawlock;	/* lock on these fields */
	u_int		dk_rawopens;	/* # of opens of rawvp */
	struct vnode	*dk_rawvp;	/* vnode for the RAW_PART bdev */

	kmutex_t	dk_openlock;	/* lock on these and openmask */
	u_int		dk_nwedges;	/* # of configured wedges */
					/* all wedges on this disk */
	LIST_HEAD(, dkwedge_softc) dk_wedges;

	/*
	 * Disk label information.  Storage for the in-core disk label
	 * must be dynamically allocated, otherwise the size of this
	 * structure becomes machine-dependent.
	 */
	daddr_t		dk_labelsector;		/* sector containing label */
	struct disklabel *dk_label;	/* label */
	struct cpu_disklabel *dk_cpulabel;
};
.Ed
.Pp
The system maintains a global linked-list of all disks attached to the
system.
This list, called
.Nm disklist ,
may grow or shrink over time as disks are dynamically added and removed
from the system.
Drivers which currently make use of the detachment
capability of the framework are the
.Nm ccd ,
.Nm dm ,
and
.Nm vnd
pseudo-device drivers.
.Pp
The following is a brief description of each function in the framework:
.Bl -tag -width ".Fn disk_set_info"
.It Fn disk_init
Initialize the disk structure.
.It Fn disk_attach
Attach a disk; allocate storage for the disklabel, set the
.Dq attached time
timestamp, insert the disk into the disklist, and increment the
system disk count.
.It Fn disk_begindetach
Check whether the disk is open, and if not, return 0.
If the disk is open, and
.Dv DETACH_FORCE
is not set in
.Fa flags ,
return
.Dv EBUSY .
Otherwise, call the provided
.Fa lastclose
routine
.Po
if not
.Dv NULL
.Pc
and return its exit code.
.It Fn disk_detach
Detach a disk; free storage for the disklabel, remove the disk
from the disklist, and decrement the system disk count.
If the count drops below zero, panic.
.It Fn disk_destroy
Release resources used by the disk structure when it is no longer
required.
.It Fn disk_wait
Disk timings are measured by counting the number of queued
requests (wait counter) and requests issued to the hardware (busy counter)
and keeping timestamp when the counters change.
The time interval between
two changes of a counter is accumulated into a total and also multiplied
by the counter value and the accumulated into a sum.
Both values can be
used to determine how much time is spent in the driver queue or in-flight
to the hardware as well as the average number of requests in either state.
.Fn disk_wait
increment the disk's wait counter and handles the accumulation.
.It Fn disk_busy
Decrements the disk's wait counter and increments the disk's
.Dq busy counter ,
and handles either accumulation.
If the wait counter is still zero, it
is assumed that the driver hasn't been updated to call
.Fn disk_wait ,
then only the values from the busy counter are available.
.It Fn disk_unbusy
Decrement the disk's busy counter and handles the accumulation.
The third argument
.Ar read
specifies the direction of I/O;
if non-zero it means reading from the disk,
otherwise it means writing to the disk.
.It Fn disk_isbusy
Returns
.Ar true
if disk is marked as busy and false if it is not.
.It Fn disk_find
Return a pointer to the disk structure corresponding to the name provided,
or
.Dv NULL
if the disk does not exist.
.It Fn disk_set_info
Setup disk-info dictionary and other dependent values of the disk structure,
the driver must have initialized the dk_geom member of
.Fa struct disk
with suitable values.
If
.Fa type
is not
.Dv NULL ,
it will be added to the dictionary.
.El
.Pp
The functions typically called by device drivers are
.Fn disk_init
.Fn disk_attach ,
.Fn disk_begindetach ,
.Fn disk_detach ,
.Fn disk_destroy ,
.Fn disk_wait ,
.Fn disk_busy ,
.Fn disk_unbusy ,
and
.Fn disk_set_info .
The function
.Fn disk_find
is provided as a utility function.
.Sh DISK IOCTLS
The following ioctls should be implemented by disk drivers:
.Bl -tag -width "xxxxxx"
.It Dv DIOCGDINFO "struct disklabel"
Get disklabel.
.It Dv DIOCSDINFO "struct disklabel"
Set in-memory disklabel.
.It Dv DIOCWDINFO "struct disklabel"
Set in-memory disklabel and write on-disk disklabel.
.It Dv DIOCGPART "struct partinfo"
Get partition information.
This is used internally.
.It Dv DIOCRFORMAT "struct format_op"
Read format.
.It Dv DIOCWFORMAT "struct format_op"
Write format.
.It Dv DIOCSSTEP "int"
Set step rate.
.It Dv DIOCSRETRIES "int"
Set number of retries.
.It Dv DIOCKLABEL "int"
Specify whether to keep or drop the in-memory disklabel
when the device is closed.
.It Dv DIOCWLABEL "int"
Enable or disable writing to the part of the disk that contains the label.
.It Dv DIOCSBAD "struct dkbad"
Set kernel dkbad.
.It Dv DIOCEJECT "int"
Eject removable disk.
.It Dv DIOCLOCK "int"
Lock or unlock disk pack.
For devices with removable media, locking is intended to prevent
the operator from removing the media.
.It Dv DIOCGDEFLABEL "struct disklabel"
Get default label.
.It Dv DIOCCLRLABEL
Clear disk label.
.It Dv DIOCGCACHE "int"
Get status of disk read and write caches.
The result is a bitmask containing the following values:
.Bl -tag -width DKCACHE_RCHANGE
.It Dv DKCACHE_READ
Read cache enabled.
.It Dv DKCACHE_WRITE
Write(back) cache enabled.
.It Dv DKCACHE_RCHANGE
Read cache enable is changeable.
.It Dv DKCACHE_WCHANGE
Write cache enable is changeable.
.It Dv DKCACHE_SAVE
Cache parameters may be saved, so that they persist across reboots
or device detach/attach cycles.
.El
.It Dv DIOCSCACHE "int"
Set status of disk read and write caches.
The input is a bitmask in the same format as used for
.Dv DIOCGCACHE .
.It Dv DIOCCACHESYNC "int"
Synchronise the disk cache.
This causes information in the disk's write cache (if any)
to be flushed to stable storage.
The argument specifies whether or not to force a flush even if
the kernel believes that there is no outstanding data.
.It Dv DIOCBSLIST "struct disk_badsecinfo"
Get bad sector list.
.It Dv DIOCBSFLUSH
Flush bad sector list.
.It Dv DIOCAWEDGE "struct dkwedge_info"
Add wedge.
.It Dv DIOCGWEDGEINFO "struct dkwedge_info"
Get wedge information.
.It Dv DIOCDWEDGE "struct dkwedge_info"
Delete wedge.
.It Dv DIOCLWEDGES "struct dkwedge_list"
List wedges.
.It Dv DIOCGSTRATEGY "struct disk_strategy"
Get disk buffer queue strategy.
.It Dv DIOCSSTRATEGY "struct disk_strategy"
Set disk buffer queue strategy.
.It Dv DIOCGDISKINFO "struct plistref"
Get disk-info dictionary.
.It Dv DIOCGMEDIASIZE "off_t"
Get disk size in bytes.
.It Dv DIOCGSECTORSIZE "u_int"
Get sector size in bytes.
.El
.Sh USING THE FRAMEWORK
This section includes a description on basic use of the framework
and example usage of its functions.
Actual implementation of a device driver which uses the framework
may vary.
.Pp
Each device in the system uses a
.Dq softc
structure which contains autoconfiguration and state information for that
device.
In the case of disks, the softc should also contain one instance
of the disk structure, e.g.:
.Bd -literal
struct foo_softc {
	device_t	sc_dev;		/* generic device information */
	struct	disk	sc_dk;		/* generic disk information */
	[ . . . more . . . ]
};
.Ed
.Pp
In order for the system to gather metrics data about a disk, the disk must
be registered with the system.
The
.Fn disk_attach
routine performs all of the functions currently required to register a disk
with the system including allocation of disklabel storage space,
recording of the time since boot that the disk was attached, and insertion
into the disklist.
Note that since this function allocates storage space for the disklabel,
it must be called before the disklabel is read from the media or used in
any other way.
Before
.Fn disk_attach
is called, a portions of the disk structure must be initialized with
data specific to that disk.
For example, in the
.Dq foo
disk driver, the following would be performed in the autoconfiguration
.Dq attach
routine:
.Bd -literal
void
fooattach(device_t parent, device_t self, void *aux)
{
	struct foo_softc *sc = device_private(self);
	[ . . . ]

	/* Initialize and attach the disk structure. */
	disk_init(&sc->sc_dk, device_xname(self), &foodkdriver);
	disk_attach(&sc->sc_dk);

	/* Read geometry and fill in pertinent parts of disklabel. */
	/* Initialize geometry values of the disk structure */
	[ . . . ]
	disk_set_info(&self>, &sc->sc_dk, type);
}
.Ed
.Pp
The
.Nm foodkdriver
above is the disk's
.Dq driver
switch.
This switch currently includes pointers to several driver entry points,
where only the
.Nm d_strategy
entry point is used by the disk framework.
This switch needs to have global scope and should be initialized as follows:
.Bd -literal
void    (foostrategy)(struct buf *);
void    (foominphys)(struct buf *);
int     (fooopen)(dev_t, int, int, struct lwp *);
int     (fooclose)(dev_t, int, int, struct lwp *);
int     (foo_discard)(device_t, off_t, off_t);
int     (foo_diskstart)(device_t, struct buf *);
void    (foo_iosize)(device_t, int *);
int     (foo_dumpblocks)(device_t, void *, daddr_t, int);
int     (foo_lastclose)(device_t);
int     (foo_firstopen)(device_t, dev_t, int, int);
int     (foo_label)(device_t, struct disklabel *);

const struct dkdriver foodkdriver = {
	.d_open = fooopen,
	.d_close = fooclose,
	.d_strategy = foostrategy,
	.d_minphys = foominphys,
	.d_discard = foo_discard,
	.d_diskstart = foo_diskstart,	/* optional */
	.d_dumpblocks = foo_dumpblocks,	/* optional */
	.d_iosize = foo_iosize,		/* optional */
	.d_firstopen = foo_firstopen,	/* optional */
	.d_lastclose = foo_lastclose,	/* optional */
	.d_label = foo_label,		/* optional */
};
.Ed
.Pp
Once the disk is attached, metrics may be gathered on that disk.
In order to gather metrics data, the driver must tell the framework when
the disk queues, starts and stops operations.
This functionality is provided by the
.Fn disk_wait ,
.Fn disk_busy
and
.Fn disk_unbusy
routines.
Because
.Nm struct disk
is part of device driver private data it needs to be guarded.
Mutual exclusion must be done by driver
.Fn disk_wait ,
.Fn disk_busy
and
.Fn disk_unbusy
are not thread safe.
The
.Fn disk_busy
routine should be called immediately before a command to the disk is
sent, e.g.:
.Bd -literal
void
foostrategy(struct buf *bp)
{
	[ . . . ]

	mutex_enter(&sc->sc_dk_mtx);
	disk_wait(&sc->sc_dk);

	/* Put buffer onto drive's transfer queue */

	mutex_exit(&sc->sc_dk_mtx);

	foostart(sc);
}

void
foostart(struct foo_softc *sc)
{
	[ . . . ]

	/* Get buffer from drive's transfer queue. */
	[ . . . ]

	/* Build command to send to drive. */
	[ . . . ]

	/* Tell the disk framework we're going busy. */
	mutex_enter(&sc->sc_dk_mtx);
	disk_busy(&sc->sc_dk);
	mutex_exit(&sc->sc_dk_mtx);

	/* Send command to the drive. */
	[ . . . ]
}
.Ed
.Pp
The routine
.Fn disk_unbusy
performs some consistency checks, such as ensuring that the calls to
.Fn disk_busy
and
.Fn disk_unbusy
are balanced.
It also performs the final steps of the metrics calcuation.
A byte count is added to the disk's running total, and if greater than
zero, the number of transfers the disk has performed is incremented.
The third argument
.Ar read
specifies the direction of I/O;
if non-zero it means reading from the disk,
otherwise it means writing to the disk.
.Bd -literal
void
foodone(xfer)
	struct foo_xfer *xfer;
{
	struct foo_softc = (struct foo_softc *)xfer->xf_softc;
	struct buf *bp = xfer->xf_buf;
	long nbytes;
	[ . . . ]

	/*
	 * Get number of bytes transferred.  If there is no buf
	 * associated with the xfer, we are being called at the
	 * end of a non-I/O command.
	 */
	if (bp == NULL)
		nbytes = 0;
	else
		nbytes = bp->b_bcount - bp->b_resid;

	[ . . . ]

	mutex_enter(&sc->sc_dk_mtx);
	/* Notify the disk framework that we've completed the transfer. */
	disk_unbusy(&sc->sc_dk, nbytes,
	    bp != NULL ? bp->b_flags & B_READ : 0);
	mutex_exit(&sc->sc_dk_mtx);

	[ . . . ]
}
.Ed
.Pp
.Fn disk_isbusy
is used to get status of disk device it returns true if device is
currently busy and false if it is not.
Like
.Fn disk_wait ,
.Fn disk_busy
and
.Fn disk_unbusy
it requires explicit locking from user side.
.Sh CODE REFERENCES
The disk framework itself is implemented within the file
.Pa sys/kern/subr_disk.c .
Data structures and function prototypes for the framework are located in
.Pa sys/sys/disk.h .
.Pp
The
.Nx
machine-independent SCSI disk and CD-ROM drivers use the
disk framework.
They are located in
.Pa sys/scsi/sd.c
and
.Pa sys/scsi/cd.c .
.Pp
The
.Nx
.Nm ccd ,
.Nm dm ,
and
.Nm vnd
drivers use the detachment capability of the framework.
They are located in
.Pa sys/dev/ccd.c ,
.Pa sys/dev/vnd.c ,
and
.Pa sys/dev/dm/device-mapper.c .
.Sh SEE ALSO
.Xr ccd 4 ,
.Xr dm 4 ,
.Xr vnd 4 ,
.Xr dksubr 9
.Sh HISTORY
The
.Nx
generic disk framework appeared in
.Nx 1.2 .
.Sh AUTHORS
The
.Nx
generic disk framework was architected and implemented by
.An Jason R. Thorpe
.Aq Mt thorpej@NetBSD.org .