Training courses

Kernel and Embedded Linux

Bootlin training courses

Embedded Linux, kernel,
Yocto Project, Buildroot, real-time,
graphics, boot time, debugging...

Bootlin logo

Elixir Cross Referencer

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
.\"	$NetBSD: nqnfs.me,v 1.2 1998/01/09 06:41:43 perry Exp $
.\"
.\" Copyright (c) 1993 The Usenix Association. All rights reserved.
.\"
.\" This document is derived from software contributed to Berkeley by
.\" Rick Macklem at The University of Guelph with the permission of
.\" the Usenix Association.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\"    must display the following acknowledgement:
.\"	This product includes software developed by the University of
.\"	California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\"    may be used to endorse or promote products derived from this software
.\"    without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\"	@(#)nqnfs.me	8.1 (Berkeley) 4/20/94
.\"
.lp
.nr PS 12
.ps 12
Reprinted with permission from the "Proceedings of the Winter 1994 Usenix
Conference", January 1994, San Francisco, CA, Copyright The Usenix
Association.
.nr PS 14
.ps 14
.sp
.ce
\fBNot Quite NFS, Soft Cache Consistency for NFS\fR
.nr PS 12
.ps 12
.sp
.ce
\fIRick Macklem\fR
.ce
\fIUniversity of Guelph\fR
.sp
.nr PS 12
.ps 12
.ce
\fBAbstract\fR
.nr PS 10
.ps 10
.pp
There are some constraints inherent in the NFS\(tm\(mo protocol
that result in performance limitations
for high performance
workstation environments.
This paper discusses an NFS-like protocol named Not Quite NFS (NQNFS),
designed to address some of these limitations.
This protocol provides full cache consistency during normal
operation, while permitting more effective client-side caching in an
effort to improve performance.
There are also a variety of minor protocol changes, in order to resolve
various NFS issues.
The emphasis is on observed performance of a
preliminary implementation of the protocol, in order to show
how well this design works
and to suggest possible areas for further improvement.
.sh 1 "Introduction"
.pp
It has been observed that
overall workstation performance has not been scaling with
processor speed and that file system I/O is a limiting factor [Ousterhout90].
Ousterhout
notes
that a principal challenge for operating system developers is the
decoupling of system calls from their underlying I/O operations, in order
to improve average system call response times.
For distributed file systems, every synchronous Remote Procedure Call (RPC)
takes a minimum of a few milliseconds and, as such, is analogous to an
underlying I/O operation.
This suggests that client caching with a very good
hit ratio for read type operations, along with asynchronous writing, is required in order to avoid delays waiting for RPC replies.
However, the NFS protocol requires that the server be stateless\**
.(f
\**The server must not require any state that may be lost due to a crash, to
function correctly.
.)f
and does not provide any explicit mechanism for client cache
consistency, putting
constraints on how the client may cache data.
This paper describes an NFS-like protocol that includes a cache consistency
component designed to enhance client caching performance. It does provide
full consistency under normal operation, but without requiring that hard
state information be maintained on the server.
Design tradeoffs were made towards simplicity and
high performance over cache consistency under abnormal conditions.
The protocol design uses a variation of Leases [Gray89]
to provide state on the server that does not need to be recovered after a
crash.
.pp
The protocol also includes changes designed to address other limitations
of NFS in a modern workstation environment.
The use of TCP transport is optionally available to avoid
the pitfalls of Sun RPC over UDP transport when running across an internetwork [Nowicki89].
Kerberos [Steiner88] support is available
to do proper user authentication, in order to provide improved security and
arbitrary client to server user ID mappings.
There are also a variety of other changes to accommodate large file systems,
such as 64bit file sizes and offsets, as well as lifting the 8Kbyte I/O size
limit.
The remainder of this paper gives an overview of the protocol, highlighting
performance related components, followed by an evaluation of resultant performance
for the 4.4BSD implementation.
.sh 1 "Distributed File Systems and Caching"
.pp
Clients using distributed file systems cache recently-used data in order
to reduce the number of synchronous server operations, and therefore improve
average response times for system calls.
Unfortunately, maintaining consistency between these caches is a problem
whenever write sharing occurs; that is, when a process on a client writes
to a file and one or more processes on other client(s) read the file.
If the writer closes the file before any reader(s) open the file for reading,
this is called sequential write sharing. Both the Andrew ITC file system
[Howard88] and NFS [Sandberg85] maintain consistency for sequential write
sharing by requiring the writer to push all the writes through to the
server on close and having readers check to see if the file has been
modified upon open. If the file has been modified, the client throws away
all cached data for that file, as it is now stale.
NFS implementations typically detect file modification by checking a cached
copy of the file's modification time; since this cached value is often
several seconds out of date and only has a resolution of one second, an NFS
client often uses stale cached data for some time after the file has
been updated on the server.
.pp
A more difficult case is concurrent write sharing, where write operations are intermixed
with read operations.
Consistency for this case, often referred to as "full cache consistency,"
requires that a reader always receives the most recently written data.
Neither NFS nor the Andrew ITC file system maintain consistency for this
case.
The simplest mechanism for maintaining full cache consistency is the one
used by Sprite [Nelson88], which disables all client caching of the
file whenever concurrent write sharing might occur.
There are other mechanisms described in the literature [Kent87a,
Burrows88], but they appeared to be too elaborate for incorporation
into NQNFS (for example, Kent's requires specialized hardware).
NQNFS differs from Sprite in the way it
detects write sharing. The Sprite server maintains a list of files currently open
by the various clients and detects write sharing when a file open request
for writing is received and the file is already open for reading
(or vice versa).
This list of open files is hard state information that must be recovered
after a server crash, which is a significant problem in its own
right [Mogul93, Welch90].
.pp
The approach used by NQNFS is a variant of the Leases mechanism [Gray89].
In this model, the server issues to a client a promise, referred to as a
"lease," that the client may cache a specific object without fear of
conflict.
A lease has a limited duration and must be renewed by the client if it
wishes to continue to cache the object.
In NQNFS, clients hold short-term (up to one minute) leases on files
for reading or writing.
The leases are analogous to entries in the open file list, except that
they expire after the lease term unless renewed by the client.
As such, one minute after issuing the last lease there are no current
leases and therefore no lease records to be recovered after a crash, hence
the term "soft server state."
.pp
A related design consideration is the way client writing is done.
Synchronous writing requires that all writes be pushed through to the server
during the write system call.
This is the simplest variant, from a consistency point of view, since the
server always has the most recently written data. It also permits any write
errors, such as "file system out of space" to be propagated back to the
client's process via the write system call return.
Unfortunately this approach limits the client write rate, based on server write
performance and client/server RPC round trip time (RTT).
.pp
An alternative to this is delayed writing, where the write system call returns
as soon as the data is cached on the client and the data is written to the
server sometime later.
This permits client writing to occur at the rate of local storage access
up to the size of the local cache.
Also, for cases where file truncation/deletion occurs shortly after writing,
the write to the server may be avoided since the data has already been
deleted, reducing server write load.
There are some obvious drawbacks to this approach.
For any Sprite-like system to maintain
full consistency, the server must "callback" to the client to cause the
delayed writes to be written back to the server when write sharing is about to
occur.
There are also problems with the propagation of errors
back to the client process that issued the write system call.
The reason for this is that
the system call has already returned without reporting an error and the
process may also have already terminated.
As well, there is a risk of the loss of recently written data if the client
crashes before the data is written back to the server.
.pp
A compromise between these two alternatives is asynchronous writing, where
the write to the server is initiated during the write system call but the write system
call returns before the write completes.
This approach minimizes the risk of data loss due to a client crash, but negates
the possibility of reducing server write load by throwing writes away when
a file is truncated or deleted.
.pp
NFS implementations usually do a mix of asynchronous and delayed writing
but push all writes to the server upon close, in order to maintain open/close
consistency.
Pushing the delayed writes on close
negates much of the performance advantage of delayed writing, since the
delays that were avoided in the write system calls are observed in the close
system call.
Akin to Sprite, the NQNFS protocol does delayed writing in an effort to achieve
good client performance and uses a callback mechanism to maintain full cache
consistency.
.sh 1 "Related Work"
.pp
There has been a great deal of effort put into improving the performance and
consistency of the NFS protocol. This work can be put in two categories.
The first category are implementation enhancements for the NFS protocol and
the second involve modifications to the protocol.
.pp
The work done on implementation enhancements have attacked two problem areas,
NFS server write performance and RPC transport problems.
Server write performance is a major problem for NFS, in part due to the
requirement to push all writes to the server upon close and in part due
to the fact that, for writes, all data and meta-data must be committed to
non-volatile storage before the server replies to the write RPC.
The Prestoserve\(tm\(dg
[Moran90]
system uses non-volatile RAM as a buffer for recently written data on the server,
so that the write RPC replies can be returned to the client before the data is written to the
disk surface.
Write gathering [Juszczak94] is a software technique used on the server where a write
RPC request is delayed for a short time in the hope that another contiguous
write request will arrive, so that they can be merged into one write operation.
Since the replies to all of the merged writes are not returned to the client until the write
operation is completed, this delay does not violate the protocol.
When write operations are merged, the number of disk writes can be reduced,
improving server write performance.
Although either of the above reduces write RPC response time for the server,
it cannot be reduced to zero, and so, any client side caching mechanism
that reduces write RPC load or client dependence on server RPC response time
should still improve overall performance.
Good client side caching should be complementary to these server techniques,
although client performance improvements as a result of caching may be less
dramatic when these techniques are used.
.pp
In NFS, each Sun RPC request is packaged in a UDP datagram for transmission
to the server. A timer is started, and if a timeout occurs before the corresponding
RPC reply is received, the RPC request is retransmitted.
There are two problems with this model.
First, when a retransmit timeout occurs, the RPC may be redone, instead of
simply retransmitting the RPC request message to the server. A recent-request
cache can be used on the server to minimize the negative impact of redoing
RPCs [Juszczak89].
The second problem is that a large UDP datagram, such as a read request or
write reply, must be fragmented by IP and if any one IP fragment is lost in
transit, the entire UDP datagram is lost [Kent87]. Since entire requests and replies
are packaged in a single UDP datagram, this puts an upper bound on the read/write
data size (8 kbytes).
.pp
Adjusting the retransmit timeout (RTT) interval dynamically and applying a
congestion window on outstanding requests has been shown to be of some help
[Nowicki89] with the retransmission problem.
An alternative to this is to use TCP transport to delivery the RPC messages
reliably [Macklem90] and one of the performance results in this paper
shows the effects of this further.
.pp
Srinivasan and Mogul [Srinivasan89] enhanced the NFS protocol to use the Sprite cache
consistency algorithm in an effort to improve performance and to provide
full client cache consistency.
This experimental implementation demonstrated significantly better
performance than NFS, but suffered from a lack of crash recovery support.
The NQNFS protocol design borrowed heavily from this work, but differed
from the Sprite algorithm by using Leases instead of file open state
to detect write sharing.
The decision to use Leases was made primarily to avoid the crash recovery
problem.
More recent work by the Sprite group [Baker91] and Mogul [Mogul93] have
addressed the crash recovery problem, making this design tradeoff more
questionable now.
.pp
Sun has recently updated the NFS protocol to Version 3 [SUN93], using some
changes similar to NQNFS to address various issues. The Version 3 protocol
uses 64bit file sizes and offsets, provides a Readdir_and_Lookup RPC and
an access RPC.
It also provides cache hints, to permit a client to be able to determine
whether a file modification is the result of that client's write or some
other client's write.
It would be possible to add either Spritely NFS or NQNFS support for cache
consistency to the NFS Version 3 protocol.
.sh 1 "NQNFS Consistency Protocol and Recovery"
.pp
The NQNFS cache consistency protocol uses a somewhat Sprite-like [Nelson88]
mechanism, but is based on Leases [Gray89] instead of hard server state information
about open files.
The basic principle is that the server disables client caching of files whenever
concurrent write sharing could occur, by performing a server-to-client
callback,
forcing the client to flush its caches and to do all subsequent I/O on the file with
synchronous RPCs.
A Sprite server maintains a record of the open state of files for
all clients and uses this to determine when concurrent write sharing might
occur.
This \fIopen state\fR information might also be referred to as an infinite-term
lease for the file, with explicit lease cancellation.
NQNFS, on the other hand, uses a short-term lease that expires due to timeout
after a maximum of one minute, unless explicitly renewed by the client.
The fundamental difference is that an NQNFS client must keep renewing
a lease to use cached data whereas a Sprite client assumes the data is valid until canceled
by the server
or the file is closed.
Using leases permits the server to remain "stateless," since the soft
state information, which consists of the set of current leases, is
moot after one minute, when all the leases expire.
.pp
Whenever a client wishes to access a file's data it must hold one of
three types of lease: read-caching, write-caching or non-caching.
The latter type requires that all file operations be done synchronously with
the server via the appropriate RPCs.
.pp
A read-caching lease allows for client data caching but no modifications
may be done.
It may, however, be shared between multiple clients. Diagram 1 shows a typical
read-caching scenario. The vertical solid black lines depict the lease records.
Note that the time lines are nowhere near to scale, since a client/server
interaction will normally take less than one hundred milliseconds, whereas the
normal lease duration is thirty seconds.
Every lease includes a \fImodrev\fR value, which changes upon every modification
of the file. It may be used to check to see if data cached on the client is
still current.
.pp
A write-caching lease permits delayed write caching,
but requires that all data be pushed to the server when the lease expires
or is terminated by an eviction callback.
When a write-caching lease has almost expired, the client will attempt to
extend the lease if the file is still open, but is required to push the delayed writes to the server
if renewal fails (as depicted by diagram 2).
The writes may not arrive at the server until after the write lease has
expired on the client, but this does not result in a consistency problem,
so long as the write lease is still valid on the server.
Note that, in diagram 2, the lease record on the server remains current after
the expiry time, due to the conditions mentioned in section 5.
If a write RPC is done on the server after the write lease has expired on
the server, this could be considered an error since consistency could be
lost, but it is not handled as such by NQNFS.
.pp
Diagram 3 depicts how read and write leases are replaced by a non-caching
lease when there is the potential for write sharing.
.(z
.sp
.PS
.ps
.ps 50
line from 0.738,5.388 to 1.238,5.388
.ps
.ps 10
dashwid = 0.050i
line dashed from 1.488,10.075 to 1.488,5.450
line dashed from 2.987,10.075 to 2.987,5.450
line dashed from 4.487,10.075 to 4.487,5.450
.ps
.ps 50
line from 4.487,7.013 to 4.487,5.950
line from 2.987,7.700 to 2.987,5.950 to 2.987,6.075
line from 1.488,7.513 to 1.488,5.950
line from 2.987,9.700 to 2.987,8.325
line from 1.488,9.450 to 1.488,8.325
.ps
.ps 10
line from 2.987,6.450 to 4.487,6.200
line from 4.385,6.192 to 4.487,6.200 to 4.393,6.241
line from 4.487,6.888 to 2.987,6.575
line from 3.080,6.620 to 2.987,6.575 to 3.090,6.571
line from 2.987,7.263 to 4.487,7.013
line from 4.385,7.004 to 4.487,7.013 to 4.393,7.054
line from 4.487,7.638 to 2.987,7.388
line from 3.082,7.429 to 2.987,7.388 to 3.090,7.379
line from 2.987,6.888 to 1.488,6.575
line from 1.580,6.620 to 1.488,6.575 to 1.590,6.571
line from 1.488,7.200 to 2.987,6.950
line from 2.885,6.942 to 2.987,6.950 to 2.893,6.991
line from 2.987,7.700 to 1.488,7.513
line from 1.584,7.550 to 1.488,7.513 to 1.590,7.500
line from 1.488,8.012 to 2.987,7.763
line from 2.885,7.754 to 2.987,7.763 to 2.893,7.804
line from 2.987,9.012 to 1.488,8.825
line from 1.584,8.862 to 1.488,8.825 to 1.590,8.813
line from 1.488,9.325 to 2.987,9.137
line from 2.885,9.125 to 2.987,9.137 to 2.891,9.175
line from 2.987,9.637 to 1.488,9.450
line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438
line from 1.488,9.887 to 2.987,9.700
line from 2.885,9.688 to 2.987,9.700 to 2.891,9.737
.ps
.ps 12
.ft
.ft R
"Lease valid on machine" at 1.363,5.296 ljust
"with same modrev" at 1.675,7.421 ljust
"miss)" at 2.612,9.233 ljust
"(cache" at 2.300,9.358 ljust
.ps
.ps 14
"Diagram #1: Read Caching Leases" at 0.738,5.114 ljust
"Client B" at 4.112,10.176 ljust
"Server" at 2.612,10.176 ljust
"Client A" at 0.925,10.176 ljust
.ps
.ps 12
"from cache" at 4.675,6.546 ljust
"Read syscalls" at 4.675,6.796 ljust
"Reply" at 3.737,6.108 ljust
"(cache miss)" at 3.675,6.421 ljust
"Read  req" at 3.737,6.608 ljust
"to lease" at 3.112,6.796 ljust
"Client B added" at 3.112,6.983 ljust
"Reply" at 3.237,7.296 ljust
"Read + lease req" at 3.175,7.671 ljust
"Read syscall" at 4.675,7.608 ljust
"Reply" at 1.675,6.796 ljust
"miss)" at 2.487,7.108 ljust
"Read req (cache" at 1.675,7.233 ljust
"from cache" at 0.425,6.296 ljust
"Read  syscalls" at 0.425,6.546 ljust
"cache" at 0.425,6.858 ljust
"so can still" at 0.425,7.108 ljust
"Modrev  same" at 0.425,7.358 ljust
"Reply" at 1.675,7.671 ljust
"Get lease req" at 1.675,8.108 ljust
"Read syscall" at 0.425,7.983 ljust
"Lease times out" at 0.425,8.296 ljust
"from cache" at 0.425,9.046 ljust
"Read syscalls" at 0.425,9.296 ljust
"for Client A" at 3.112,9.296 ljust
"Read caching lease" at 3.112,9.483 ljust
"Reply" at 1.675,8.983 ljust
"Read req" at 1.675,9.358 ljust
"Reply" at 1.675,9.608 ljust
"Read + lease req" at 1.675,9.921 ljust
"Read syscall" at 0.425,9.921 ljust
.ps
.ft
.PE
.sp
.)z
.(z
.sp
.PS
.ps
.ps 50
line from 1.175,5.700 to 1.300,5.700
line from 0.738,5.700 to 1.175,5.700
line from 2.987,6.638 to 2.987,6.075
.ps
.ps 10
dashwid = 0.050i
line dashed from 2.987,6.575 to 2.987,5.950
line dashed from 1.488,6.575 to 1.488,5.888
.ps
.ps 50
line from 2.987,9.762 to 2.987,6.638
line from 1.488,9.450 to 1.488,7.700
.ps
.ps 10
line from 2.987,6.763 to 1.488,6.575
line from 1.584,6.612 to 1.488,6.575 to 1.590,6.563
line from 1.488,7.013 to 2.987,6.825
line from 2.885,6.813 to 2.987,6.825 to 2.891,6.862
line from 2.987,7.325 to 1.488,7.075
line from 1.582,7.116 to 1.488,7.075 to 1.590,7.067
line from 1.488,7.700 to 2.987,7.388
line from 2.885,7.383 to 2.987,7.388 to 2.895,7.432
line from 2.987,8.575 to 1.488,8.325
line from 1.582,8.366 to 1.488,8.325 to 1.590,8.317
line from 1.488,8.887 to 2.987,8.637
line from 2.885,8.629 to 2.987,8.637 to 2.893,8.679
line from 2.987,9.637 to 1.488,9.450
line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438
line from 1.488,9.887 to 2.987,9.762
line from 2.886,9.746 to 2.987,9.762 to 2.890,9.796
line dashed from 2.987,10.012 to 2.987,6.513
line dashed from 1.488,10.012 to 1.488,6.513
.ps
.ps 12
.ft
.ft R
"write" at 4.237,5.921 ljust
"Lease valid on machine" at 1.425,5.733 ljust
.ps
.ps 14
"Diagram #2: Write Caching Lease" at 0.738,5.551 ljust
"Server" at 2.675,10.114 ljust
"Client A" at 1.113,10.114 ljust
.ps
.ps 12
"seconds after last" at 3.112,5.921 ljust
"Expires write_slack" at 3.112,6.108 ljust
"due to write activity" at 3.112,6.608 ljust
"Expiry delayed" at 3.112,6.796 ljust
"Lease times out" at 3.112,7.233 ljust
"Lease renewed" at 3.175,8.546 ljust
"Lease for client A" at 3.175,9.358 ljust
"Write caching" at 3.175,9.608 ljust
"Reply" at 1.675,6.733 ljust
"Write req" at 1.988,7.046 ljust
"Reply" at 1.675,7.233 ljust
"Write req" at 1.675,7.796 ljust
"Lease expires" at 0.487,7.733 ljust
"Close syscall" at 0.487,8.108 ljust
"lease granted" at 1.675,8.546 ljust
"Get write lease" at 1.675,8.921 ljust
"before expiry" at 0.487,8.608 ljust
"Lease renewal" at 0.487,8.796 ljust
"syscalls" at 0.487,9.046 ljust
"Delayed write" at 0.487,9.233 ljust
"lease granted" at 1.675,9.608 ljust
"Get write lease req" at 1.675,9.921 ljust
"Write syscall" at 0.487,9.858 ljust
.ps
.ft
.PE
.sp
.)z
.(z
.sp
.PS
.ps
.ps 50
line from 0.613,2.638 to 1.238,2.638
line from 1.488,4.075 to 1.488,3.638
line from 2.987,4.013 to 2.987,3.575
line from 4.487,4.013 to 4.487,3.575
.ps
.ps 10
line from 2.987,3.888 to 4.487,3.700
line from 4.385,3.688 to 4.487,3.700 to 4.391,3.737
line from 4.487,4.138 to 2.987,3.950
line from 3.084,3.987 to 2.987,3.950 to 3.090,3.938
line from 2.987,4.763 to 4.487,4.450
line from 4.385,4.446 to 4.487,4.450 to 4.395,4.495
.ps
.ps 50
line from 4.487,4.438 to 4.487,4.013
.ps
.ps 10
line from 4.487,5.138 to 2.987,4.888
line from 3.082,4.929 to 2.987,4.888 to 3.090,4.879
.ps
.ps 50
line from 4.487,6.513 to 4.487,5.513
line from 4.487,6.513 to 4.487,6.513 to 4.487,5.513
line from 2.987,5.450 to 2.987,5.200
line from 1.488,5.075 to 1.488,4.075
line from 2.987,5.263 to 2.987,4.013
line from 2.987,7.700 to 2.987,5.325
line from 4.487,7.575 to 4.487,6.513
line from 1.488,8.512 to 1.488,8.075
line from 2.987,8.637 to 2.987,8.075
line from 2.987,9.637 to 2.987,8.825
line from 1.488,9.450 to 1.488,8.950
.ps
.ps 10
line from 2.987,4.450 to 1.488,4.263
line from 1.584,4.300 to 1.488,4.263 to 1.590,4.250
line from 1.488,4.888 to 2.987,4.575
line from 2.885,4.571 to 2.987,4.575 to 2.895,4.620
line from 2.987,5.263 to 1.488,5.075
line from 1.584,5.112 to 1.488,5.075 to 1.590,5.063
line from 4.487,5.513 to 2.987,5.325
line from 3.084,5.362 to 2.987,5.325 to 3.090,5.313
line from 2.987,5.700 to 4.487,5.575
line from 4.386,5.558 to 4.487,5.575 to 4.390,5.608
line from 4.487,6.013 to 2.987,5.825
line from 3.084,5.862 to 2.987,5.825 to 3.090,5.813
line from 2.987,6.200 to 4.487,6.075
line from 4.386,6.058 to 4.487,6.075 to 4.390,6.108
line from 4.487,6.450 to 2.987,6.263
line from 3.084,6.300 to 2.987,6.263 to 3.090,6.250
line from 2.987,6.700 to 4.487,6.513
line from 4.385,6.500 to 4.487,6.513 to 4.391,6.550
line from 1.488,6.950 to 2.987,6.763
line from 2.885,6.750 to 2.987,6.763 to 2.891,6.800
line from 2.987,7.700 to 4.487,7.575
line from 4.386,7.558 to 4.487,7.575 to 4.390,7.608
line from 4.487,7.950 to 2.987,7.763
line from 3.084,7.800 to 2.987,7.763 to 3.090,7.750
line from 2.987,8.637 to 1.488,8.512
line from 1.585,8.546 to 1.488,8.512 to 1.589,8.496
line from 1.488,8.887 to 2.987,8.700
line from 2.885,8.688 to 2.987,8.700 to 2.891,8.737
line from 2.987,9.637 to 1.488,9.450
line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438
line from 1.488,9.950 to 2.987,9.762
line from 2.885,9.750 to 2.987,9.762 to 2.891,9.800
dashwid = 0.050i
line dashed from 4.487,10.137 to 4.487,2.825
line dashed from 2.987,10.137 to 2.987,2.825
line dashed from 1.488,10.137 to 1.488,2.825
.ps
.ps 12
.ft
.ft R
"(not cached)" at 4.612,3.858 ljust
.ps
.ps 14
"Diagram #3: Write sharing case" at 0.613,2.239 ljust
.ps
.ps 12
"Write syscall" at 4.675,7.546 ljust
"Read syscall" at 0.550,9.921 ljust
.ps
.ps 14
"Lease valid on machine" at 1.363,2.551 ljust
.ps
.ps 12
"(can still cache)" at 1.675,8.171 ljust
"Reply" at 3.800,3.858 ljust
"Write" at 3.175,4.046 ljust
"writes" at 4.612,4.046 ljust
"synchronous" at 4.612,4.233 ljust
"write syscall" at 4.675,5.108 ljust
"non-caching lease" at 3.175,4.296 ljust
"Reply " at 3.175,4.483 ljust
"req" at 3.175,4.983 ljust
"Get write lease" at 3.175,5.108 ljust
"Vacated msg" at 3.175,5.483 ljust
"to the server" at 4.675,5.858 ljust
"being flushed to" at 4.675,6.046 ljust
"Delayed writes" at 4.675,6.233 ljust
.ps
.ps 16
"Server" at 2.675,10.182 ljust
"Client B" at 3.925,10.182 ljust
"Client A" at 0.863,10.182 ljust
.ps
.ps 12
"(not cached)" at 0.550,4.733 ljust
"Read data" at 0.550,4.921 ljust
"Reply  data" at 1.675,4.421 ljust
"Read request" at 1.675,4.921 ljust
"lease" at 1.675,5.233 ljust
"Reply non-caching" at 1.675,5.421 ljust
"Reply" at 3.737,5.733 ljust
"Write" at 3.175,5.983 ljust
"Reply" at 3.737,6.171 ljust
"Write" at 3.175,6.421 ljust
"Eviction Notice" at 3.175,6.796 ljust
"Get read lease" at 1.675,7.046 ljust
"Read syscall" at 0.550,6.983 ljust
"being cached" at 4.675,7.171 ljust
"Delayed writes" at 4.675,7.358 ljust
"lease" at 3.175,7.233 ljust
"Reply write caching" at 3.175,7.421 ljust
"Get  write lease" at 3.175,7.983 ljust
"Write syscall" at 4.675,7.983 ljust
"with same modrev" at 1.675,8.358 ljust
"Lease" at 0.550,8.171 ljust
"Renewed" at 0.550,8.358 ljust
"Reply" at 1.675,8.608 ljust
"Get Lease Request" at 1.675,8.983 ljust
"Read syscall" at 0.550,8.733 ljust
"from cache" at 0.550,9.108 ljust
"Read syscall" at 0.550,9.296 ljust
"Reply " at 1.675,9.671 ljust
"plus lease" at 2.050,9.983 ljust
"Read Request" at 1.675,10.108 ljust
.ps
.ft
.PE
.sp
.)z
A write-caching lease is not used in the Stanford V Distributed System [Gray89],
since synchronous writing is always used. A side effect of this change
is that the five to ten second lease duration recommended by Gray was found
to be insufficient to achieve good performance for the write-caching lease.
Experimentation showed that thirty seconds was about optimal for cases where
the client and server are connected to the same local area network, so
thirty seconds is the default lease duration for NQNFS.
A maximum of twice that value is permitted, since Gray showed that for some
network topologies, a larger lease duration functions better.
Although there is an explicit get_lease RPC defined for the protocol,
most lease requests are piggybacked onto the other RPCs to minimize the
additional overhead introduced by leasing.
.sh 2 "Rationale"
.pp
Leasing was chosen over hard server state information for the following
reasons:
.ip 1.
The server must maintain state information about all current
client leases.
Since at most one lease is allocated for each RPC and the leases expire
after their lease term,
the upper bound on the number of current leases is the product of the
lease term and the server RPC rate.
In practice, it has been observed that less than 10% of RPCs request new leases
and since most leases have a term of thirty seconds, the following rule of
thumb should estimate the number of server lease records:
.sp
.nf
	Number of Server Lease Records \(eq 0.1 * 30 * RPC rate
.fi
.sp
Since each lease record occupies 64 bytes of server memory, storing the lease
records should not be a serious problem.
If a server has exhausted lease storage, it can simply wait a few seconds
for a lease to expire and free up a record.
On the other hand, a Sprite-like server must store records for all files
currently open by all clients, which can require significant storage for
a large, heavily loaded server.
In [Mogul93], it is proposed that a mechanism vaguely similar to paging could be
used to deal with this for Spritely NFS, but this
appears to introduce a fair amount of complexity and may limit the
usefulness of open records for storing other state information, such
as file locks.
.ip 2.
After a server crashes it must recover lease records for
the current outstanding leases, which actually implies that if it waits
until all leases have expired, there is no state to recover.
The server must wait for the maximum lease duration of one minute, and it must serve
all outstanding write requests resulting from terminated write-caching
leases before issuing new leases. The one minute delay can be overlapped with
file system consistency checking (eg. fsck).
Because no state must be recovered, a lease-based server, like an NFS server,
avoids the problem of state recovery after a crash.
.sp
There can, however, be problems during crash recovery
because of a potentially large number of write backs due to terminated
write-caching leases.
One of these problems is a "recovery storm" [Baker91], which could occur when
the server is overloaded by the number of write RPC requests.
The NQNFS protocol deals with this by replying
with a return status code called
try_again_later to all
RPC requests (except write) until the write requests subside.
At this time, there has not been sufficient testing of server crash
recovery while under heavy server load to determine if the try_again_later
reply is a sufficient solution to the problem.
The other problem is that consistency will be lost if other RPCs are performed
before all of the write backs for terminated write-caching leases have completed.
This is handled by only performing write RPCs until
no write RPC requests arrive
for write_slack seconds, where write_slack is set to several times
the client timeout retransmit interval,
at which time it is assumed all clients have had an opportunity to send their writes
to the server.
.ip 3.
Another advantage of leasing is that, since leases are required at times when other I/O operations occur,
lease requests can almost always be piggybacked on other RPCs, avoiding some of the
overhead associated with the explicit open and close RPCs required by a Sprite-like system.
Compared with Sprite cache consistency,
this can result in a significantly lower RPC load (see table #1).
.sh 1 "Limitations of the NQNFS Protocol"
.pp
There is a serious risk when leasing is used for delayed write
caching.
If the server is simply too busy to service a lease renewal before a write-caching
lease terminates, the client will not be able to push the write
data to the server before the lease has terminated, resulting in
inconsistency.
Note that the danger of inconsistency occurs when the server assumes that
a write-caching lease has terminated before the client has
had the opportunity to write the data back to the server.
In an effort to avoid this problem, the NQNFS server does not assume that
a write-caching lease has terminated until three conditions are met:
.sp
.(l
1 - clock time > (expiry time + clock skew)
2 - there is at least one server daemon (nfsd) waiting for an RPC request
3 - no write RPCs received for leased file within write_slack after the corrected expiry time
.)l
.lp
The first condition ensures that the lease has expired on the client.
The clock_skew, by default three seconds, must be
set to a value larger than the maximum time-of-day clock error that is likely to occur
during the maximum lease duration.
The second condition attempts to ensure that the client
is not waiting for replies to any writes that are still queued for service by
an nfsd. The third condition tries to guarantee that the client has
transmitted all write requests to the server, since write_slack is set to
several times the client's timeout retransmit interval.
.pp
There are also certain file system semantics that are problematic for both NFS and NQNFS,
due to the
lack of state information maintained by the
server. If a file is unlinked on one client while open on another it will
be removed from the file server, resulting in failed file accesses on the
client that has the file open.
If the file system on the server is out of space or the client user's disk
quota has been exceeded, a delayed write can fail long after the write system
call was successfully completed.
With NFS this error will be detected by the close system call, since
the delayed writes are pushed upon close. With NQNFS however, the delayed write
RPC may not occur until after the close system call, possibly even after the process
has exited.
Therefore,
if a process must check for write errors,
a system call such as \fIfsync\fR must be used.
.pp
Another problem occurs when a process on one client is
running an executable file
and a process on another client starts to write to the file. The read lease on
the first client is terminated by the server, but the client has no recourse but
to terminate the process, since the process is already in progress on the old
executable.
.pp
The NQNFS protocol does not support file locking, since a file lock would have
to involve hard, recovered after a crash, state information.
.sh 1 "Other NQNFS Protocol Features"
.pp
NQNFS also includes a variety of minor modifications to the NFS protocol, in an
attempt to address various limitations.
The protocol uses 64bit file sizes and offsets in order to handle large files.
TCP transport may be used as an alternative to UDP
for cases where UDP does not perform well.
Transport mechanisms
such as TCP also permit the use of much larger read/write data sizes,
which might improve performance in certain environments.
.pp
The NQNFS protocol replaces the Readdir RPC with a Readdir_and_Lookup
RPC that returns the file handle and attributes for each file in the
directory as well as name and file id number.
This additional information may then be loaded into the lookup and file-attribute
caches on the client.
Thus, for cases such as "ls -l", the \fIstat\fR system calls can be performed
locally without doing any lookup or getattr RPCs.
Another additional RPC is the Access RPC that checks for file
accessibility against the server. This is necessary since in some cases the
client user ID is mapped to a different user on the server and doing the
access check locally on the client using file attributes and client credentials is
not correct.
One case where this becomes necessary is when the NQNFS mount point is using
Kerberos authentication, where the Kerberos authentication ticket is translated
to credentials on the server that are mapped to the client side user id.
For further details on the protocol, see [Macklem93].
.sh 1 "Performance"
.pp
In order to evaluate the effectiveness of the NQNFS protocol,
a benchmark was used that was
designed to typify
real work on the client workstation.
Benchmarks, such as Laddis [Wittle93], that perform server load characterization
are not appropriate for this work, since it is primarily client caching
efficiency that needs to be evaluated.
Since these tests are measuring overall client system performance and
not just the performance of the file system,
each sequence of runs was performed on identical hardware and operating system in order to factor out the system
components affecting performance other than the file system protocol.
.pp
The equipment used for the all the benchmarks are members of the DECstation\(tm\(dg
family of workstations using the MIPS\(tm\(sc RISC architecture.
The operating system running on these systems was a pre-release version of
4.4BSD Unix\(tm\(dd.
For all benchmarks, the file server was a DECstation 2100 (10 MIPS) with 8Mbytes of
memory and a local RZ23 SCSI disk (27msec average access time).
The clients range in speed from DECstation 2100s
to a DECstation 5000/25, and always run with six block I/O daemons
and a 4Mbyte buffer cache, except for the test runs where the
buffer cache size was the independent variable.
In all cases /tmp is mounted on the local SCSI disk\**, all machines were
attached to the same uncongested Ethernet, and ran in single user mode during the benchmarks.
.(f
\**Testing using the 4.4BSD MFS [McKusick90] resulted in slightly degraded performance,
probably since the machines only had 16Mbytes of memory, and so paging
increased.
.)f
Unless noted otherwise, test runs used UDP RPC transport
and the results given are the average values of four runs.
.pp
The benchmark used is the Modified Andrew Benchmark (MAB)
[Ousterhout90],
which is a slightly modified version of the benchmark used to characterize
performance of the Andrew ITC file system [Howard88].
The MAB was set up with the executable binaries in the remote mounted file
system and the final load step was commented out, due to a linkage problem
during testing under 4.4BSD.
Therefore, these results are not directly comparable to other reported MAB
results.
The MAB is made up of five distinct phases:
.sp
.ip "1." 10
Makes five directories (no significant cost)
.ip "2." 10
Copy a file system subtree to a working directory
.ip "3." 10
Get file attributes (stat) of all the working files
.ip "4." 10
Search for strings (grep) in the files
.ip "5." 10
Compile a library of C sources and archive them
.lp
Of the five phases, the fifth is by far the largest and is the one affected most
by client caching mechanisms.
The results for phase #1 are invariant over all
the caching mechanisms.
.sh 2 "Buffer Cache Size Tests"
.pp
The first experiment was done to see what effect changing the size of the
buffer cache would have on client performance. A single DECstation 5000/25
was used to do a series of runs of MAB with different buffer cache sizes
for four variations of the file system protocol. The four variations are
as follows:
.ip "Case 1:" 10
NFS - The NFS protocol as implemented in 4.4BSD
.ip "Case 2:" 10
Leases - The NQNFS protocol using leases for cache consistency
.ip "Case 3:" 10
Leases, Rdirlookup - The NQNFS protocol using leases for cache consistency
and with the readdir RPC replaced by Readdir_and_Lookup
.ip "Case 4:" 10
Leases, Attrib leases, Rdirlookup - The NQNFS protocol using leases for
cache consistency, with the readdir
RPC replaced by the Readdir_and_Lookup,
and requiring a valid lease not only for file-data access, but also for file-attribute access.
.lp
As can be seen in figure 1, the buffer cache achieves about optimal
performance for the range of two to ten megabytes in size. At eleven
megabytes in size, the system pages heavily and the runs did not
complete in a reasonable time. Even at 64Kbytes, the buffer cache improves
performance over no buffer cache by a significant margin of 136-148 seconds
versus 239 seconds.
This may be due, in part, to the fact that the Compile Phase of the MAB
uses a rather small working set of file data.
All variants of NQNFS achieve about
the same performance, running around 30% faster than NFS, with a slightly
larger difference for large buffer cache sizes.
Based on these results, all remaining tests were run with the buffer cache
size set to 4Mbytes.
Although I do not know what causes the local peak in the curves between 0.5 and 2 megabytes,
there is some indication that contention for buffer cache blocks, between the update process
(which pushes delayed writes to the server every thirty seconds) and the I/O
system calls, may be involved.
.(z
.PS
.ps
.ps 10
dashwid = 0.050i
line dashed from 0.900,7.888 to 4.787,7.888
line dashed from 0.900,7.888 to 0.900,10.262
line from 0.900,7.888 to 0.963,7.888
line from 4.787,7.888 to 4.725,7.888
line from 0.900,8.188 to 0.963,8.188
line from 4.787,8.188 to 4.725,8.188
line from 0.900,8.488 to 0.963,8.488
line from 4.787,8.488 to 4.725,8.488
line from 0.900,8.775 to 0.963,8.775
line from 4.787,8.775 to 4.725,8.775
line from 0.900,9.075 to 0.963,9.075
line from 4.787,9.075 to 4.725,9.075
line from 0.900,9.375 to 0.963,9.375
line from 4.787,9.375 to 4.725,9.375
line from 0.900,9.675 to 0.963,9.675
line from 4.787,9.675 to 4.725,9.675
line from 0.900,9.963 to 0.963,9.963
line from 4.787,9.963 to 4.725,9.963
line from 0.900,10.262 to 0.963,10.262
line from 4.787,10.262 to 4.725,10.262
line from 0.900,7.888 to 0.900,7.950
line from 0.900,10.262 to 0.900,10.200
line from 1.613,7.888 to 1.613,7.950
line from 1.613,10.262 to 1.613,10.200
line from 2.312,7.888 to 2.312,7.950
line from 2.312,10.262 to 2.312,10.200
line from 3.025,7.888 to 3.025,7.950
line from 3.025,10.262 to 3.025,10.200
line from 3.725,7.888 to 3.725,7.950
line from 3.725,10.262 to 3.725,10.200
line from 4.438,7.888 to 4.438,7.950
line from 4.438,10.262 to 4.438,10.200
line from 0.900,7.888 to 4.787,7.888
line from 4.787,7.888 to 4.787,10.262
line from 4.787,10.262 to 0.900,10.262
line from 0.900,10.262 to 0.900,7.888
line from 3.800,8.775 to 4.025,8.775
line from 0.925,10.088 to 0.925,10.088
line from 0.925,10.088 to 0.938,9.812
line from 0.938,9.812 to 0.988,9.825
line from 0.988,9.825 to 1.075,9.838
line from 1.075,9.838 to 1.163,9.938
line from 1.163,9.938 to 1.250,9.838
line from 1.250,9.838 to 1.613,9.825
line from 1.613,9.825 to 2.312,9.750
line from 2.312,9.750 to 3.025,9.713
line from 3.025,9.713 to 3.725,9.850
line from 3.725,9.850 to 4.438,9.875
dashwid = 0.037i
line dotted from 3.800,8.625 to 4.025,8.625
line dotted from 0.925,9.912 to 0.925,9.912
line dotted from 0.925,9.912 to 0.938,9.887
line dotted from 0.938,9.887 to 0.988,9.713
line dotted from 0.988,9.713 to 1.075,9.562
line dotted from 1.075,9.562 to 1.163,9.562
line dotted from 1.163,9.562 to 1.250,9.562
line dotted from 1.250,9.562 to 1.613,9.675
line dotted from 1.613,9.675 to 2.312,9.363
line dotted from 2.312,9.363 to 3.025,9.375
line dotted from 3.025,9.375 to 3.725,9.387
line dotted from 3.725,9.387 to 4.438,9.450
line dashed from 3.800,8.475 to 4.025,8.475
line dashed from 0.925,10.000 to 0.925,10.000
line dashed from 0.925,10.000 to 0.938,9.787
line dashed from 0.938,9.787 to 0.988,9.650
line dashed from 0.988,9.650 to 1.075,9.537
line dashed from 1.075,9.537 to 1.163,9.613
line dashed from 1.163,9.613 to 1.250,9.800
line dashed from 1.250,9.800 to 1.613,9.488
line dashed from 1.613,9.488 to 2.312,9.375
line dashed from 2.312,9.375 to 3.025,9.363
line dashed from 3.025,9.363 to 3.725,9.325
line dashed from 3.725,9.325 to 4.438,9.438
dashwid = 0.075i
line dotted from 3.800,8.325 to 4.025,8.325
line dotted from 0.925,9.963 to 0.925,9.963
line dotted from 0.925,9.963 to 0.938,9.750
line dotted from 0.938,9.750 to 0.988,9.662
line dotted from 0.988,9.662 to 1.075,9.613
line dotted from 1.075,9.613 to 1.163,9.613
line dotted from 1.163,9.613 to 1.250,9.700
line dotted from 1.250,9.700 to 1.613,9.438
line dotted from 1.613,9.438 to 2.312,9.463
line dotted from 2.312,9.463 to 3.025,9.312
line dotted from 3.025,9.312 to 3.725,9.387
line dotted from 3.725,9.387 to 4.438,9.425
.ps
.ps -1
.ft
.ft I
"0" at 0.825,7.810 rjust
"20" at 0.825,8.110 rjust
"40" at 0.825,8.410 rjust
"60" at 0.825,8.697 rjust
"80" at 0.825,8.997 rjust
"100" at 0.825,9.297 rjust
"120" at 0.825,9.597 rjust
"140" at 0.825,9.885 rjust
"160" at 0.825,10.185 rjust
"0" at 0.900,7.660
"2" at 1.613,7.660
"4" at 2.312,7.660
"6" at 3.025,7.660
"8" at 3.725,7.660
"10" at 4.438,7.660
"Time (sec)" at 0.150,8.997
"Buffer Cache Size (MBytes)" at 2.837,7.510
"Figure #1: MAB Phase 5 (compile)" at 2.837,10.335
"NFS" at 3.725,8.697 rjust
"Leases" at 3.725,8.547 rjust
"Leases, Rdirlookup" at 3.725,8.397 rjust
"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust
.ps
.ft
.PE
.)z
.sh 2 "Multiple Client Load Tests"
.pp
During preliminary runs of the MAB, it was observed that the server RPC
counts were reduced significantly by NQNFS as compared to NFS (table 1).
(Spritely NFS and Ultrix\(tm4.3/NFS numbers were taken from [Mogul93]
and are not directly comparable, due to numerous differences in the
experimental setup including deletion of the load step from phase 5.)
This suggests
that the NQNFS protocol might scale better with
respect to the number of clients accessing the server.
The experiment described in this section
ran the MAB on from one to ten clients concurrently, to observe the
effects of heavier server load.
The clients were started at roughly the same time by pressing all the
<return> keys together and, although not synchronized beyond that point,
all clients would finish the test run within about two seconds of each
other.
This was not a realistic load of N active clients, but it did
result in a reproducible increasing client load on the server.
The results for the four variants
are plotted in figures 2-5.
.(z
.ps -1
.R
.TS
box, center;
c s s s s s s s
c c c c c c c c
l | n n n n n n n.
Table #1: MAB RPC Counts
RPC	Getattr	Read	Write	Lookup	Other	GetLease/Open-Close	Total
_
BSD/NQNFS	277	139	306	575	294	127	1718
BSD/NFS	1210	506	451	489	238	0	2894
Spritely NFS	259	836	192	535	306	1467	3595
Ultrix4.3/NFS	1225	1186	476	810	305	0	4002
.TE
.ps
.)z
.pp
For the MAB benchmark, the NQNFS protocol reduces the RPC counts significantly,
but with a minimum of extra overhead (the GetLease/Open-Close count).
.(z
.PS
.ps
.ps 10
dashwid = 0.050i
line dashed from 0.900,7.888 to 4.787,7.888
line dashed from 0.900,7.888 to 0.900,10.262
line from 0.900,7.888 to 0.963,7.888
line from 4.787,7.888 to 4.725,7.888
line from 0.900,8.225 to 0.963,8.225
line from 4.787,8.225 to 4.725,8.225
line from 0.900,8.562 to 0.963,8.562
line from 4.787,8.562 to 4.725,8.562
line from 0.900,8.900 to 0.963,8.900
line from 4.787,8.900 to 4.725,8.900
line from 0.900,9.250 to 0.963,9.250
line from 4.787,9.250 to 4.725,9.250
line from 0.900,9.588 to 0.963,9.588
line from 4.787,9.588 to 4.725,9.588
line from 0.900,9.925 to 0.963,9.925
line from 4.787,9.925 to 4.725,9.925
line from 0.900,10.262 to 0.963,10.262
line from 4.787,10.262 to 4.725,10.262
line from 0.900,7.888 to 0.900,7.950
line from 0.900,10.262 to 0.900,10.200
line from 1.613,7.888 to 1.613,7.950
line from 1.613,10.262 to 1.613,10.200
line from 2.312,7.888 to 2.312,7.950
line from 2.312,10.262 to 2.312,10.200
line from 3.025,7.888 to 3.025,7.950
line from 3.025,10.262 to 3.025,10.200
line from 3.725,7.888 to 3.725,7.950
line from 3.725,10.262 to 3.725,10.200
line from 4.438,7.888 to 4.438,7.950
line from 4.438,10.262 to 4.438,10.200
line from 0.900,7.888 to 4.787,7.888
line from 4.787,7.888 to 4.787,10.262
line from 4.787,10.262 to 0.900,10.262
line from 0.900,10.262 to 0.900,7.888
line from 3.800,8.900 to 4.025,8.900
line from 1.250,8.325 to 1.250,8.325
line from 1.250,8.325 to 1.613,8.500
line from 1.613,8.500 to 2.312,8.825
line from 2.312,8.825 to 3.025,9.175
line from 3.025,9.175 to 3.725,9.613
line from 3.725,9.613 to 4.438,10.012
dashwid = 0.037i
line dotted from 3.800,8.750 to 4.025,8.750
line dotted from 1.250,8.275 to 1.250,8.275
line dotted from 1.250,8.275 to 1.613,8.412
line dotted from 1.613,8.412 to 2.312,8.562
line dotted from 2.312,8.562 to 3.025,9.088
line dotted from 3.025,9.088 to 3.725,9.375
line dotted from 3.725,9.375 to 4.438,10.000
line dashed from 3.800,8.600 to 4.025,8.600
line dashed from 1.250,8.250 to 1.250,8.250
line dashed from 1.250,8.250 to 1.613,8.438
line dashed from 1.613,8.438 to 2.312,8.637
line dashed from 2.312,8.637 to 3.025,9.088
line dashed from 3.025,9.088 to 3.725,9.525
line dashed from 3.725,9.525 to 4.438,10.075
dashwid = 0.075i
line dotted from 3.800,8.450 to 4.025,8.450
line dotted from 1.250,8.262 to 1.250,8.262
line dotted from 1.250,8.262 to 1.613,8.425
line dotted from 1.613,8.425 to 2.312,8.613
line dotted from 2.312,8.613 to 3.025,9.137
line dotted from 3.025,9.137 to 3.725,9.512
line dotted from 3.725,9.512 to 4.438,9.988
.ps
.ps -1
.ft
.ft I
"0" at 0.825,7.810 rjust
"20" at 0.825,8.147 rjust
"40" at 0.825,8.485 rjust
"60" at 0.825,8.822 rjust
"80" at 0.825,9.172 rjust
"100" at 0.825,9.510 rjust
"120" at 0.825,9.847 rjust
"140" at 0.825,10.185 rjust
"0" at 0.900,7.660
"2" at 1.613,7.660
"4" at 2.312,7.660
"6" at 3.025,7.660
"8" at 3.725,7.660
"10" at 4.438,7.660
"Time (sec)" at 0.150,8.997
"Number of Clients" at 2.837,7.510
"Figure #2: MAB Phase 2 (copying)" at 2.837,10.335
"NFS" at 3.725,8.822 rjust
"Leases" at 3.725,8.672 rjust
"Leases, Rdirlookup" at 3.725,8.522 rjust
"Leases, Attrib leases, Rdirlookup" at 3.725,8.372 rjust
.ps
.ft
.PE
.)z
.(z
.PS
.ps
.ps 10
dashwid = 0.050i
line dashed from 0.900,7.888 to 4.787,7.888
line dashed from 0.900,7.888 to 0.900,10.262
line from 0.900,7.888 to 0.963,7.888
line from 4.787,7.888 to 4.725,7.888
line from 0.900,8.188 to 0.963,8.188
line from 4.787,8.188 to 4.725,8.188
line from 0.900,8.488 to 0.963,8.488
line from 4.787,8.488 to 4.725,8.488
line from 0.900,8.775 to 0.963,8.775
line from 4.787,8.775 to 4.725,8.775
line from 0.900,9.075 to 0.963,9.075
line from 4.787,9.075 to 4.725,9.075
line from 0.900,9.375 to 0.963,9.375
line from 4.787,9.375 to 4.725,9.375
line from 0.900,9.675 to 0.963,9.675
line from 4.787,9.675 to 4.725,9.675
line from 0.900,9.963 to 0.963,9.963
line from 4.787,9.963 to 4.725,9.963
line from 0.900,10.262 to 0.963,10.262
line from 4.787,10.262 to 4.725,10.262
line from 0.900,7.888 to 0.900,7.950
line from 0.900,10.262 to 0.900,10.200
line from 1.613,7.888 to 1.613,7.950
line from 1.613,10.262 to 1.613,10.200
line from 2.312,7.888 to 2.312,7.950
line from 2.312,10.262 to 2.312,10.200
line from 3.025,7.888 to 3.025,7.950
line from 3.025,10.262 to 3.025,10.200
line from 3.725,7.888 to 3.725,7.950
line from 3.725,10.262 to 3.725,10.200
line from 4.438,7.888 to 4.438,7.950
line from 4.438,10.262 to 4.438,10.200
line from 0.900,7.888 to 4.787,7.888
line from 4.787,7.888 to 4.787,10.262
line from 4.787,10.262 to 0.900,10.262
line from 0.900,10.262 to 0.900,7.888
line from 3.800,8.775 to 4.025,8.775
line from 1.250,8.975 to 1.250,8.975
line from 1.250,8.975 to 1.613,8.963
line from 1.613,8.963 to 2.312,8.988
line from 2.312,8.988 to 3.025,9.037
line from 3.025,9.037 to 3.725,9.062
line from 3.725,9.062 to 4.438,9.100
dashwid = 0.037i
line dotted from 3.800,8.625 to 4.025,8.625
line dotted from 1.250,9.312 to 1.250,9.312
line dotted from 1.250,9.312 to 1.613,9.287
line dotted from 1.613,9.287 to 2.312,9.675
line dotted from 2.312,9.675 to 3.025,9.262
line dotted from 3.025,9.262 to 3.725,9.738
line dotted from 3.725,9.738 to 4.438,9.512
line dashed from 3.800,8.475 to 4.025,8.475
line dashed from 1.250,9.400 to 1.250,9.400
line dashed from 1.250,9.400 to 1.613,9.287
line dashed from 1.613,9.287 to 2.312,9.575
line dashed from 2.312,9.575 to 3.025,9.300
line dashed from 3.025,9.300 to 3.725,9.613
line dashed from 3.725,9.613 to 4.438,9.512
dashwid = 0.075i
line dotted from 3.800,8.325 to 4.025,8.325
line dotted from 1.250,9.400 to 1.250,9.400
line dotted from 1.250,9.400 to 1.613,9.412
line dotted from 1.613,9.412 to 2.312,9.700
line dotted from 2.312,9.700 to 3.025,9.537
line dotted from 3.025,9.537 to 3.725,9.938
line dotted from 3.725,9.938 to 4.438,9.812
.ps
.ps -1
.ft
.ft I
"0" at 0.825,7.810 rjust
"5" at 0.825,8.110 rjust
"10" at 0.825,8.410 rjust
"15" at 0.825,8.697 rjust
"20" at 0.825,8.997 rjust
"25" at 0.825,9.297 rjust
"30" at 0.825,9.597 rjust
"35" at 0.825,9.885 rjust
"40" at 0.825,10.185 rjust
"0" at 0.900,7.660
"2" at 1.613,7.660
"4" at 2.312,7.660
"6" at 3.025,7.660
"8" at 3.725,7.660
"10" at 4.438,7.660
"Time (sec)" at 0.150,8.997
"Number of Clients" at 2.837,7.510
"Figure #3: MAB Phase 3 (stat/find)" at 2.837,10.335
"NFS" at 3.725,8.697 rjust
"Leases" at 3.725,8.547 rjust
"Leases, Rdirlookup" at 3.725,8.397 rjust
"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust
.ps
.ft
.PE
.)z
.(z
.PS
.ps
.ps 10
dashwid = 0.050i
line dashed from 0.900,7.888 to 4.787,7.888
line dashed from 0.900,7.888 to 0.900,10.262
line from 0.900,7.888 to 0.963,7.888
line from 4.787,7.888 to 4.725,7.888
line from 0.900,8.188 to 0.963,8.188
line from 4.787,8.188 to 4.725,8.188
line from 0.900,8.488 to 0.963,8.488
line from 4.787,8.488 to 4.725,8.488
line from 0.900,8.775 to 0.963,8.775
line from 4.787,8.775 to 4.725,8.775
line from 0.900,9.075 to 0.963,9.075
line from 4.787,9.075 to 4.725,9.075
line from 0.900,9.375 to 0.963,9.375
line from 4.787,9.375 to 4.725,9.375
line from 0.900,9.675 to 0.963,9.675
line from 4.787,9.675 to 4.725,9.675
line from 0.900,9.963 to 0.963,9.963
line from 4.787,9.963 to 4.725,9.963
line from 0.900,10.262 to 0.963,10.262
line from 4.787,10.262 to 4.725,10.262
line from 0.900,7.888 to 0.900,7.950
line from 0.900,10.262 to 0.900,10.200
line from 1.613,7.888 to 1.613,7.950
line from 1.613,10.262 to 1.613,10.200
line from 2.312,7.888 to 2.312,7.950
line from 2.312,10.262 to 2.312,10.200
line from 3.025,7.888 to 3.025,7.950
line from 3.025,10.262 to 3.025,10.200
line from 3.725,7.888 to 3.725,7.950
line from 3.725,10.262 to 3.725,10.200
line from 4.438,7.888 to 4.438,7.950
line from 4.438,10.262 to 4.438,10.200
line from 0.900,7.888 to 4.787,7.888
line from 4.787,7.888 to 4.787,10.262
line from 4.787,10.262 to 0.900,10.262
line from 0.900,10.262 to 0.900,7.888
line from 3.800,8.775 to 4.025,8.775
line from 1.250,9.412 to 1.250,9.412
line from 1.250,9.412 to 1.613,9.425
line from 1.613,9.425 to 2.312,9.463
line from 2.312,9.463 to 3.025,9.600
line from 3.025,9.600 to 3.725,9.875
line from 3.725,9.875 to 4.438,10.075
dashwid = 0.037i
line dotted from 3.800,8.625 to 4.025,8.625
line dotted from 1.250,9.450 to 1.250,9.450
line dotted from 1.250,9.450 to 1.613,9.438
line dotted from 1.613,9.438 to 2.312,9.438
line dotted from 2.312,9.438 to 3.025,9.525
line dotted from 3.025,9.525 to 3.725,9.550
line dotted from 3.725,9.550 to 4.438,9.662
line dashed from 3.800,8.475 to 4.025,8.475
line dashed from 1.250,9.438 to 1.250,9.438
line dashed from 1.250,9.438 to 1.613,9.412
line dashed from 1.613,9.412 to 2.312,9.450
line dashed from 2.312,9.450 to 3.025,9.500
line dashed from 3.025,9.500 to 3.725,9.613
line dashed from 3.725,9.613 to 4.438,9.675
dashwid = 0.075i
line dotted from 3.800,8.325 to 4.025,8.325
line dotted from 1.250,9.387 to 1.250,9.387
line dotted from 1.250,9.387 to 1.613,9.600
line dotted from 1.613,9.600 to 2.312,9.625
line dotted from 2.312,9.625 to 3.025,9.738
line dotted from 3.025,9.738 to 3.725,9.850
line dotted from 3.725,9.850 to 4.438,9.800
.ps
.ps -1
.ft
.ft I
"0" at 0.825,7.810 rjust
"5" at 0.825,8.110 rjust
"10" at 0.825,8.410 rjust
"15" at 0.825,8.697 rjust
"20" at 0.825,8.997 rjust
"25" at 0.825,9.297 rjust
"30" at 0.825,9.597 rjust
"35" at 0.825,9.885 rjust
"40" at 0.825,10.185 rjust
"0" at 0.900,7.660
"2" at 1.613,7.660
"4" at 2.312,7.660
"6" at 3.025,7.660
"8" at 3.725,7.660
"10" at 4.438,7.660
"Time (sec)" at 0.150,8.997
"Number of Clients" at 2.837,7.510
"Figure #4: MAB Phase 4 (grep/wc/find)" at 2.837,10.335
"NFS" at 3.725,8.697 rjust
"Leases" at 3.725,8.547 rjust
"Leases, Rdirlookup" at 3.725,8.397 rjust
"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust
.ps
.ft
.PE
.)z
.(z
.PS
.ps
.ps 10
dashwid = 0.050i
line dashed from 0.900,7.888 to 4.787,7.888
line dashed from 0.900,7.888 to 0.900,10.262
line from 0.900,7.888 to 0.963,7.888
line from 4.787,7.888 to 4.725,7.888
line from 0.900,8.150 to 0.963,8.150
line from 4.787,8.150 to 4.725,8.150
line from 0.900,8.412 to 0.963,8.412
line from 4.787,8.412 to 4.725,8.412
line from 0.900,8.675 to 0.963,8.675
line from 4.787,8.675 to 4.725,8.675
line from 0.900,8.938 to 0.963,8.938
line from 4.787,8.938 to 4.725,8.938
line from 0.900,9.213 to 0.963,9.213
line from 4.787,9.213 to 4.725,9.213
line from 0.900,9.475 to 0.963,9.475
line from 4.787,9.475 to 4.725,9.475
line from 0.900,9.738 to 0.963,9.738
line from 4.787,9.738 to 4.725,9.738
line from 0.900,10.000 to 0.963,10.000
line from 4.787,10.000 to 4.725,10.000
line from 0.900,10.262 to 0.963,10.262
line from 4.787,10.262 to 4.725,10.262
line from 0.900,7.888 to 0.900,7.950
line from 0.900,10.262 to 0.900,10.200
line from 1.613,7.888 to 1.613,7.950
line from 1.613,10.262 to 1.613,10.200
line from 2.312,7.888 to 2.312,7.950
line from 2.312,10.262 to 2.312,10.200
line from 3.025,7.888 to 3.025,7.950
line from 3.025,10.262 to 3.025,10.200
line from 3.725,7.888 to 3.725,7.950
line from 3.725,10.262 to 3.725,10.200
line from 4.438,7.888 to 4.438,7.950
line from 4.438,10.262 to 4.438,10.200
line from 0.900,7.888 to 4.787,7.888
line from 4.787,7.888 to 4.787,10.262
line from 4.787,10.262 to 0.900,10.262
line from 0.900,10.262 to 0.900,7.888
line from 3.800,8.675 to 4.025,8.675
line from 1.250,8.800 to 1.250,8.800
line from 1.250,8.800 to 1.613,8.912
line from 1.613,8.912 to 2.312,9.113
line from 2.312,9.113 to 3.025,9.438
line from 3.025,9.438 to 3.725,9.750
line from 3.725,9.750 to 4.438,10.088
dashwid = 0.037i
line dotted from 3.800,8.525 to 4.025,8.525
line dotted from 1.250,8.637 to 1.250,8.637
line dotted from 1.250,8.637 to 1.613,8.700
line dotted from 1.613,8.700 to 2.312,8.713
line dotted from 2.312,8.713 to 3.025,8.775
line dotted from 3.025,8.775 to 3.725,8.887
line dotted from 3.725,8.887 to 4.438,9.037
line dashed from 3.800,8.375 to 4.025,8.375
line dashed from 1.250,8.675 to 1.250,8.675
line dashed from 1.250,8.675 to 1.613,8.688
line dashed from 1.613,8.688 to 2.312,8.713
line dashed from 2.312,8.713 to 3.025,8.825
line dashed from 3.025,8.825 to 3.725,8.887
line dashed from 3.725,8.887 to 4.438,9.062
dashwid = 0.075i
line dotted from 3.800,8.225 to 4.025,8.225
line dotted from 1.250,8.700 to 1.250,8.700
line dotted from 1.250,8.700 to 1.613,8.688
line dotted from 1.613,8.688 to 2.312,8.762
line dotted from 2.312,8.762 to 3.025,8.812
line dotted from 3.025,8.812 to 3.725,8.925
line dotted from 3.725,8.925 to 4.438,9.025
.ps
.ps -1
.ft
.ft I
"0" at 0.825,7.810 rjust
"50" at 0.825,8.072 rjust
"100" at 0.825,8.335 rjust
"150" at 0.825,8.597 rjust
"200" at 0.825,8.860 rjust
"250" at 0.825,9.135 rjust
"300" at 0.825,9.397 rjust
"350" at 0.825,9.660 rjust
"400" at 0.825,9.922 rjust
"450" at 0.825,10.185 rjust
"0" at 0.900,7.660
"2" at 1.613,7.660
"4" at 2.312,7.660
"6" at 3.025,7.660
"8" at 3.725,7.660
"10" at 4.438,7.660
"Time (sec)" at 0.150,8.997
"Number of Clients" at 2.837,7.510
"Figure #5: MAB Phase 5 (compile)" at 2.837,10.335
"NFS" at 3.725,8.597 rjust
"Leases" at 3.725,8.447 rjust
"Leases, Rdirlookup" at 3.725,8.297 rjust
"Leases, Attrib leases, Rdirlookup" at 3.725,8.147 rjust
.ps
.ft
.PE
.)z
.pp
In figure 2, where a subtree of seventy small files is copied, the difference between the protocol variants is minimal,
with the NQNFS variants performing slightly better.
For this case, the Readdir_and_Lookup RPC is a slight hindrance under heavy
load, possibly because it results in larger directory blocks in the buffer
cache.
.pp
In figure 3, for the phase that gets file attributes for a large number
of files, the leasing variants take about 50% longer, indicating that
there are performance problems in this area. For the case where valid
current leases are required for every file when attributes are returned,
the performance is significantly worse than when the attributes are allowed
to be stale by a few seconds on the client.
I have not been able to explain the oscillation in the curves for the
Lease cases.
.pp
For the string searching phase depicted in figure 4, the leasing variants
that do not require valid leases for files when attributes are returned
appear to scale better with server load than NFS.
However, the effect appears to be
negligible until the server load is fairly heavy.
.pp
Most of the time in the MAB benchmark is spent in the compilation phase
and this is where the differences between caching methods are most
pronounced.
In figure 5 it can be seen that any protocol variant using Leases performs
about a factor of two better than NFS
at a load of ten clients. This indicates that the use of NQNFS may
allow servers to handle significantly more clients for this type of
workload.
.pp
Table 2 summarizes the MAB run times for all phases for the single client
DECstation 5000/25. The \fILeases\fR case refers to using leases, whereas
the \fILeases, Rdirl\fR case uses the Readdir_and_Lookup RPC as well and
the \fIBCache Only\fR case uses leases, but only the buffer cache and not
the attribute or name caches.
The \fINo Caching\fR cases does not do any client side caching, performing
all system calls via synchronous RPCs to the server.
.(z
.ps -1
.R
.TS
box, center;
c s s s s s s
c c c c c c c c
l | n n n n n n n.
Table #2: Single DECstation 5000/25 Client Elapsed Times (sec)
Phase	1	2	3	4	5	Total	% Improvement
_
No Caching	6	35	41	40	258	380	-93
NFS	5	24	15	20	133	197	0
BCache Only	5	20	24	23	116	188	5
Leases, Rdirl	5	20	21	20	105	171	13
Leases	5	19	21	21	99	165	16
.TE
.ps
.)z
.sh 2 "Processor Speed Tests"
.pp
An important goal of client-side file system caching is to decouple the
I/O system calls from the underlying distributed file system, so that the
client's system performance might scale with processor speed. In order
to test this, a series of MAB runs were performed on three
DECstations that are similar except for processor speed.
In addition to the four protocol variants used for the above tests, runs
were done with the client caches turned off, for
worst case performance numbers for caching mechanisms with a 100% miss rate. The CPU utilization
was measured, as an indicator of how much the processor was blocking for
I/O system calls. Note that since the systems were running in single user mode
and otherwise quiescent, almost all CPU activity was directly related
to the MAB run.
The results are presented in
table 3.
The CPU time is simply the product of the CPU utilization and
elapsed running time and, as such, is the optimistic bound on performance
achievable with an ideal client caching scheme that never blocks for I/O.
.(z
.ps -1
.R
.TS
box, center;
c s s s s s s s s s
c c s s c s s c s s
c c c c c c c c c c
c c c c c c c c c c
l | n n n n n n n n n.
Table #3: MAB Phase 5 (compile)
	DS2100 (10.5 MIPS)	DS3100 (14.0 MIPS)	DS5000/25 (26.7 MIPS)
	Elapsed	CPU	CPU	Elapsed	CPU	CPU	Elapsed	CPU	CPU
	time	Util(%)	time	time	Util(%)	time	time	Util(%)	time
_
Leases	143	89	127	113	87	98	99	89	88
Leases, Rdirl	150	89	134	110	91	100	105	88	92
BCache Only	169	85	144	129	78	101	116	75	87
NFS	172	77	132	135	74	100	133	71	94
No Caching	330	47	155	256	41	105	258	39	101
.TE
.ps
.)z
As can be seen in the table, any caching mechanism achieves significantly
better performance than when caching is disabled, roughly doubling the CPU
utilization with a corresponding reduction in run time. For NFS, the CPU
utilization is dropping with increase in CPU speed, which would suggest that
it is not scaling with CPU speed. For the NQNFS variants, the CPU utilization
remains at just below 90%, which suggests that the caching mechanism is working
well and scaling within this CPU range.
Note that for this benchmark, the ratio of CPU times for
the DECstation 3100 and DECstation 5000/25 are quite different than the
Dhrystone MIPS ratings would suggest.
.pp
Overall, the results seem encouraging, although it remains to be seen whether
or not the caching provided by NQNFS can continue to scale with CPU
performance.
There is a good indication that NQNFS permits a server to scale
to more clients than does NFS, at least for workloads akin to the MAB compile phase.
A more difficult question is "What if the server is much faster doing
write RPCs?" as a result of some technology such as Prestoserve
or write gathering.
Since a significant part of the difference between NFS and NQNFS is
the synchronous writing, it is difficult to predict how much a server
capable of fast write RPCs will negate the performance improvements of NQNFS.
At the very least, table 1 indicates that the write RPC load on the server
has decreased by approximately 30%, and this reduced write load should still
result in some improvement.
.pp
Indications are that the Readdir_and_Lookup RPC has not improved performance
for these tests and may in fact be degrading performance slightly.
The results in figure 3 indicate some problems, possibly with handling
of the attribute cache. It seems logical that the Readdir_and_Lookup RPC
should be permit priming of the attribute cache improving hit rate, but the
results are counter to that.
.sh 2 "Internetwork Delay Tests"
.pp
This experimental setup was used to explore how the different protocol
variants might perform over internetworks with larger RPC RTTs. The
server was moved to a separate Ethernet, using a MicroVAXII\(tm as an
IP router to the other Ethernet. The 4.3Reno BSD Unix system running on the
MicroVAXII was modified to delay IP packets being forwarded by a tunable N
millisecond delay. The implementation was rather crude and did not try to
simulate a distribution of delay times nor was it programmed to drop packets
at a given rate, but it served as a simple emulation of a long,
fat network\** [Jacobson88].
.(f
\**Long fat networks refer to network interconnections with
a Bandwidth X RTT product > 10\u5\d bits.
.)f
The MAB was run using both UDP and TCP RPC transports
for a variety of RTT delays from five to two hundred milliseconds,
to observe the effects of RTT delay on RPC transport.
It was found that, due to a high variability between runs, four runs was not
suffice, so eight runs at each value was done.
The results in figure 6 and table 4 are the average for the eight runs.
.(z
.PS
.ps
.ps 10
dashwid = 0.050i
line dashed from 0.900,7.888 to 4.787,7.888
line dashed from 0.900,7.888 to 0.900,10.262
line from 0.900,7.888 to 0.963,7.888
line from 4.787,7.888 to 4.725,7.888
line from 0.900,8.350 to 0.963,8.350
line from 4.787,8.350 to 4.725,8.350
line from 0.900,8.800 to 0.963,8.800
line from 4.787,8.800 to 4.725,8.800
line from 0.900,9.262 to 0.963,9.262
line from 4.787,9.262 to 4.725,9.262
line from 0.900,9.713 to 0.963,9.713
line from 4.787,9.713 to 4.725,9.713
line from 0.900,10.175 to 0.963,10.175
line from 4.787,10.175 to 4.725,10.175
line from 0.900,7.888 to 0.900,7.950
line from 0.900,10.262 to 0.900,10.200
line from 1.825,7.888 to 1.825,7.950
line from 1.825,10.262 to 1.825,10.200
line from 2.750,7.888 to 2.750,7.950
line from 2.750,10.262 to 2.750,10.200
line from 3.675,7.888 to 3.675,7.950
line from 3.675,10.262 to 3.675,10.200
line from 4.600,7.888 to 4.600,7.950
line from 4.600,10.262 to 4.600,10.200
line from 0.900,7.888 to 4.787,7.888
line from 4.787,7.888 to 4.787,10.262
line from 4.787,10.262 to 0.900,10.262
line from 0.900,10.262 to 0.900,7.888
line from 4.125,8.613 to 4.350,8.613
line from 0.988,8.400 to 0.988,8.400
line from 0.988,8.400 to 1.637,8.575
line from 1.637,8.575 to 2.375,8.713
line from 2.375,8.713 to 3.125,8.900
line from 3.125,8.900 to 3.862,9.137
line from 3.862,9.137 to 4.600,9.425
dashwid = 0.037i
line dotted from 4.125,8.463 to 4.350,8.463
line dotted from 0.988,8.375 to 0.988,8.375
line dotted from 0.988,8.375 to 1.637,8.525
line dotted from 1.637,8.525 to 2.375,8.850
line dotted from 2.375,8.850 to 3.125,8.975
line dotted from 3.125,8.975 to 3.862,9.137
line dotted from 3.862,9.137 to 4.600,9.625
line dashed from 4.125,8.312 to 4.350,8.312
line dashed from 0.988,8.525 to 0.988,8.525
line dashed from 0.988,8.525 to 1.637,8.688
line dashed from 1.637,8.688 to 2.375,8.838
line dashed from 2.375,8.838 to 3.125,9.150
line dashed from 3.125,9.150 to 3.862,9.275
line dashed from 3.862,9.275 to 4.600,9.588
dashwid = 0.075i
line dotted from 4.125,8.162 to 4.350,8.162
line dotted from 0.988,8.525 to 0.988,8.525
line dotted from 0.988,8.525 to 1.637,8.838
line dotted from 1.637,8.838 to 2.375,8.863
line dotted from 2.375,8.863 to 3.125,9.137
line dotted from 3.125,9.137 to 3.862,9.387
line dotted from 3.862,9.387 to 4.600,10.200
.ps
.ps -1
.ft
.ft I
"0" at 0.825,7.810 rjust
"100" at 0.825,8.272 rjust
"200" at 0.825,8.722 rjust
"300" at 0.825,9.185 rjust
"400" at 0.825,9.635 rjust
"500" at 0.825,10.097 rjust
"0" at 0.900,7.660
"50" at 1.825,7.660
"100" at 2.750,7.660
"150" at 3.675,7.660
"200" at 4.600,7.660
"Time (sec)" at 0.150,8.997
"Round Trip Delay (msec)" at 2.837,7.510
"Figure #6: MAB Phase 5 (compile)" at 2.837,10.335
"Leases,UDP" at 4.050,8.535 rjust
"Leases,TCP" at 4.050,8.385 rjust
"NFS,UDP" at 4.050,8.235 rjust
"NFS,TCP" at 4.050,8.085 rjust
.ps
.ft
.PE
.)z
.(z
.ps -1
.R
.TS
box, center;
c s s s s s s s s
c c s c s c s c s
c c c c c c c c c
c c c c c c c c c
l | n n n n n n n n.
Table #4: MAB Phase 5 (compile) for Internetwork Delays
	NFS,UDP	NFS,TCP	Leases,UDP	Leases,TCP
Delay	Elapsed	Standard	Elapsed	Standard	Elapsed	Standard	Elapsed	Standard
(msec)	time (sec)	Deviation	time (sec)	Deviation	time (sec)	Deviation	time (sec)	Deviation
_
5	139	2.9	139	2.4	112	7.0	108	6.0
40	175	5.1	208	44.5	150	23.8	139	4.3
80	207	3.9	213	4.7	180	7.7	210	52.9
120	276	29.3	273	17.1	221	7.7	238	5.8
160	304	7.2	328	77.1	275	21.5	274	10.1
200	372	35.0	506	235.1	338	25.2	379	69.2
.TE
.ps
.)z
.pp
I found these results somewhat surprising, since I had assumed that stability
across an internetwork connection would be a function of RPC transport
protocol.
Looking at the standard deviations observed between the eight runs, there is an indication
that the NQNFS protocol plays a larger role in
maintaining stability than the underlying RPC transport protocol.
It appears that NFS over TCP transport
is the least stable variant tested.
It should be noted that the TCP implementation used was roughly at 4.3BSD Tahoe
release and that the 4.4BSD TCP implementation was far less stable and would
fail intermittently, due to a bug I was not able to isolate.
It would appear that some of the recent enhancements to the 4.4BSD TCP
implementation have a detrimental effect on the performance of
RPC-type traffic loads, which intermix small and large
data transfers in both directions.
It is obvious that more exploration of this area is needed before any
conclusions can be made
beyond the fact that over a local area network, TCP transport provides
performance comparable to UDP.
.sh 1 "Lessons Learned"
.pp
Evaluating the performance of a distributed file system is fraught with
difficulties, due to the many software and hardware factors involved.
The limited benchmarking presented here took a considerable amount of time
and the results gained by the exercise only give indications of what the
performance might be for a few scenarios.
.pp
The IP router with delay introduction proved to be a valuable tool for protocol debugging\**,
.(f
\**It exposed two bugs in the 4.4BSD networking, one a problem in the Lance chip
driver for the DECstation and the other a TCP window sizing problem that I was
not able to isolate.
.)f
and may be useful for a more extensive study of performance over internetworks
if enhanced to do a better job of simulating internetwork delay and packet loss.
.pp
The Leases mechanism provided a simple model for the provision of cache
consistency and did seem to improve performance for various scenarios.
Unfortunately, it does not provide the server state information that is required
for file system semantics, such as locking, that many software systems demand.
In production environments on my campus, the need for file locking and the correct
generation of the ETXTBSY error code
are far more important that full cache consistency, and leasing
does not satisfy these needs.
Another file system semantic that requires hard server state is the delay
of file removal until the last close system call. Although Spritely NFS
did not support this semantic either, it is logical that the open file
state maintained by that system would facilitate the implementation of
this semantic more easily than would the Leases mechanism.
.sh 1 "Further Work"
.pp
The current implementation uses a fixed, moderate sized buffer cache designed
for the local UFS [McKusick84] file system.
The results in figure 1 suggest that this is adequate so long as the cache
is of an appropriate size.
However, a mechanism permitting the cache to vary in size
has been shown to outperform fixed sized buffer caches [Nelson90], and could
be beneficial. It could also be useful to allow the buffer cache to grow very
large by making use of local backing store for cases where server performance
is limited.
A very large buffer cache size would in turn permit experimentation with
much larger read/write data sizes, facilitating bulk data transfers
across long fat networks, such as will characterize the Internet of the
near future.
A careful redesign of the buffer cache mechanism to provide
support for these features would probably be the next implementation step.
.pp
The results in figure 3 indicate that the mechanics of caching file
attributes and maintaining the attribute cache's consistency needs to
be looked at further.
There also needs to be more work done on the interaction between a
Readdir_and_Lookup RPC and the name and attribute caches, in an effort
to reduce Getattr and Lookup RPC loads.
.pp
The NQNFS protocol has never been used in a production environment and doing
so would provide needed insight into how well the protocol saisfies the
needs of real workstation environments.
It is hoped that the distribution of the implementation in 4.4BSD will
facilitate use of the protocol in production environments elsewhere.
.pp
The big question that needs to be resolved is whether Leases are an adequate
mechanism for cache consistency or whether hard server state is required.
Given the work presented here and in the papers related to Sprite and Spritely
NFS, there are clear indications that a cache consistency algorithm can
improve both performance and file system semantics.
As yet, however, it is unclear what the best approach to maintain consistency is.
It would appear that hard state information is required for file locking and
other mechanisms and, if so, it seems appropriate to use it for cache
consistency as well.
.sh 1 "Acknowledgements"
.pp
I would like to thank the members of the CSRG at the University of California,
Berkeley for their continued support over the years. Without their encouragement and assistance this
software would never have been implemented.
Prof. Jim Linders and Prof. Tom Wilson here at the University of Guelph helped
proofread this paper and Jeffrey Mogul provided a great deal of
assistance, helping to turn my gibberish into something at least moderately
readable.
.sh 1 "References"
.ip [Baker91] 15
Mary Baker and John Ousterhout, Availability in the Sprite Distributed
File System, In \fIOperating System Review\fR, (25)2, pg. 95-98,
April 1991.
.ip [Baker91a] 15
Mary Baker, private communication, May 1991.
.ip [Burrows88] 15
Michael Burrows, Efficient Data Sharing, Technical Report #153,
Computer Laboratory, University of Cambridge, Dec. 1988.
.ip [Gray89] 15
Cary G. Gray and David R. Cheriton, Leases: An Efficient Fault-Tolerant
Mechanism for Distributed File Cache Consistency, In \fIProc. of the
Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park,
AZ, Dec. 1989.
.ip [Howard88] 15
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols,
M. Satyanarayanan, Robert N. Sidebotham and Michael J. West,
Scale and Performance in a Distributed File System, \fIACM Trans. on
Computer Systems\fR, (6)1, pg 51-81, Feb. 1988.
.ip [Jacobson88] 15
Van Jacobson and R. Braden, \fITCP Extensions for Long-Delay Paths\fR,
ARPANET Working Group Requests for Comment, DDN Network Information Center,
SRI International, Menlo Park, CA, October 1988, RFC-1072.
.ip [Jacobson89] 15
Van Jacobson, Sun NFS Performance Problems, \fIPrivate Communication,\fR
November, 1989.
.ip [Juszczak89] 15
Chet Juszczak, Improving the Performance and Correctness of an NFS Server,
In \fIProc. Winter 1989 USENIX Conference,\fR pg. 53-63, San Diego, CA, January 1989.
.ip [Juszczak94] 15
Chet Juszczak, Improving the Write Performance of an NFS Server,
to appear in \fIProc. Winter 1994 USENIX Conference,\fR San Francisco, CA, January 1994.
.ip [Kazar88] 15
Michael L. Kazar, Synchronization and Caching Issues in the Andrew File System,
In \fIProc. Winter 1988 USENIX Conference,\fR pg. 27-36, Dallas, TX, February
1988.
.ip [Kent87] 15
Christopher. A. Kent and Jeffrey C. Mogul, \fIFragmentation Considered Harmful\fR, Research Report 87/3,
Digital Equipment Corporation Western Research Laboratory, Dec. 1987.
.ip [Kent87a] 15
Christopher. A. Kent, \fICache Coherence in Distributed Systems\fR, Research Report 87/4,
Digital Equipment Corporation Western Research Laboratory, April 1987.
.ip [Macklem90] 15
Rick Macklem, Lessons Learned Tuning the 4.3BSD Reno Implementation of the
NFS Protocol,
In \fIProc. Winter 1991 USENIX Conference,\fR pg. 53-64, Dallas, TX,
January 1991.
.ip [Macklem93] 15
Rick Macklem, The 4.4BSD NFS Implementation,
In \fIThe System Manager's Manual\fR, 4.4 Berkeley Software Distribution,
University of California, Berkeley, June 1993.
.ip [McKusick84] 15
Marshall K. McKusick, William N. Joy, Samuel J. Leffler and Robert S. Fabry,
A Fast File System for UNIX, \fIACM Transactions on Computer Systems\fR,
Vol. 2, Number 3, pg. 181-197, August 1984.
.ip [McKusick90] 15
Marshall K. McKusick, Michael J. Karels and Keith Bostic, A Pageable Memory
Based Filesystem,
In \fIProc. Summer 1990 USENIX Conference,\fR pg. 137-143, Anaheim, CA, June
1990.
.ip [Mogul93] 15
Jeffrey C. Mogul, Recovery in Spritely NFS,
Research Report 93/2, Digital Equipment Corporation Western Research
Laboratory, June 1993.
.ip [Moran90] 15
Joseph Moran, Russel Sandberg, Don Coleman, Jonathan Kepecs and Bob Lyon,
Breaking Through the NFS Performance Barrier,
In \fIProc. Spring 1990 EUUG Conference,\fR pg. 199-206, Munich, FRG,
April 1990.
.ip [Nelson88] 15
Michael N. Nelson, Brent B. Welch, and John K. Ousterhout, Caching in the
Sprite Network File System, \fIACM Transactions on Computer Systems\fR (6)1
pg. 134-154, February 1988.
.ip [Nelson90] 15
Michael N. Nelson, \fIVirtual Memory vs. The File System\fR, Research Report
90/4, Digital Equipment Corporation Western Research Laboratory, March 1990.
.ip [Nowicki89] 15
Bill Nowicki, Transport Issues in the Network File System, In \fIComputer
Communication Review\fR, pg. 16-20, March 1989.
.ip [Ousterhout90] 15
John K. Ousterhout, Why Aren't Operating Systems Getting Faster As Fast as
Hardware? In \fIProc. Summer 1990 USENIX Conference\fR, pg. 247-256, Anaheim,
CA, June 1990.
.ip [Sandberg85] 15
Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon,
Design and Implementation of the Sun Network filesystem, In \fIProc. Summer
1985 USENIX Conference\fR, pages 119-130, Portland, OR, June 1985.
.ip [Srinivasan89] 15
V. Srinivasan and Jeffrey. C. Mogul, Spritely NFS: Experiments with
Cache-Consistency Protocols,
In \fIProc. of the
Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park,
AZ, Dec. 1989.
.ip [Steiner88] 15
J. G. Steiner, B. C. Neuman and J. I. Schiller, Kerberos: An Authentication
Service for Open Network Systems,
In \fIProc. Winter 1988 USENIX Conference,\fR pg. 191-202, Dallas, TX, February
1988.
.ip [SUN89] 15
Sun Microsystems Inc., \fINFS: Network File System Protocol Specification\fR,
ARPANET Working Group Requests for Comment, DDN Network Information Center,
SRI International, Menlo Park, CA, March 1989, RFC-1094.
.ip [SUN93] 15
Sun Microsystems Inc., \fINFS: Network File System Version 3 Protocol Specification\fR,
Sun Microsystems Inc., Mountain View, CA, June 1993.
.ip [Wittle93] 15
Mark Wittle and Bruce E. Keith, LADDIS: The Next Generation in NFS File
Server Benchmarking,
In \fIProc. Summer 1993 USENIX Conference,\fR pg. 111-128, Cincinnati, OH, June
1993.
.(f
\(mo
NFS is believed to be a trademark of Sun Microsystems, Inc.
.)f
.(f
\(dg
Prestoserve is a trademark of Legato Systems, Inc.
.)f
.(f
\(sc
MIPS is a trademark of Silicon Graphics, Inc.
.)f
.(f
\(dg
DECstation, MicroVAXII and Ultrix are trademarks of Digital Equipment Corp.
.)f
.(f
\(dd
Unix is a trademark of Novell, Inc.
.)f