Training courses

Kernel and Embedded Linux

Bootlin training courses

Embedded Linux, kernel,
Yocto Project, Buildroot, real-time,
graphics, boot time, debugging...

Bootlin logo

Elixir Cross Referencer

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
.\" $NetBSD: ctf.5,v 1.3 2015/09/29 06:33:01 wiz Exp $
.\"
.\" This file and its contents are supplied under the terms of the
.\" Common Development and Distribution License ("CDDL"), version 1.0.
.\" You may only use this file in accordance with the terms of version
.\" 1.0 of the CDDL.
.\"
.\" A full copy of the text of the CDDL should have accompanied this
.\" source.  A copy of the CDDL is also available via the Internet at
.\" http://www.illumos.org/license/CDDL.
.\"
.\"
.\" Copyright (c) 2014 Joyent, Inc.
.\"
.Dd September 26, 2014
.Dt CTF 5
.Os
.Sh NAME
.Nm ctf
.Nd Compact C Type Format
.Sh SYNOPSIS
.In sys/ctf.h
.Sh DESCRIPTION
.Nm
is designed to be a compact representation of the C programming
language's type information focused on serving the needs of dynamic
tracing, debuggers, and other in-situ and post-mortem introspection
tools.
.Nm
data is generally included in
.Sy ELF
objects and is tagged as
.Sy SHT_PROGBITS
to ensure that the data is accessible in a running process and in subsequent
core dumps, if generated.
.Lp
The
.Nm
data contained in each file has information about the layout and
sizes of C types, including intrinsic types, enumerations, structures,
typedefs, and unions, that are used by the corresponding
.Sy ELF
object.
The
.Nm
data may also include information about the types of global objects and
the return type and arguments of functions in the symbol table.
.Lp
Because a
.Nm
file is often embedded inside a file, rather than being a standalone
file itself, it may also be referred to as a
.Nm
.Sy container .
.Lp
On illumos systems,
.Nm
data is consumed by multiple programs.
It can be used by
.\" the modular
.\" debugger,
.\" .Xr mdb 1 ,
.\" as well as by
.Xr dtrace 1 .
Programmatic access to
.Nm
data can be obtained through
.Xr libctf 3 .
.Lp
The
.Nm
file format is broken down into seven different sections.
The first
section is the
.Sy preamble
and
.Sy header ,
which describes the version of the
.Nm
file, links it has to other
.Nm
files, and the sizes of the other sections.
The next section is the
.Sy label
section,
which provides a way of identifying similar groups of
.Nm
data across multiple files.
This is followed by the
.Sy object
information section, which describes the type of global
symbols.
The subsequent section is the
.Sy function
information section, which describes the return
types and arguments of functions.
The next section is the
.Sy type
information section, which describes
the format and layout of the C types themselves, and finally the last
section is the
.Sy string
section, which contains the names of types, enumerations, members, and
labels.
.Lp
While strictly speaking, only the
.Sy preamble
and
.Sy header
are required, to be actually useful, both the type and string
sections are necessary.
.Lp
A
.Nm
file may contain all of the type information that it requires, or it
may optionally refer to another
.Nm
file which holds the remaining types.
When a
.Nm
file refers to another file, it is called the
.Sy child
and the file it refers to is called the
.Sy parent .
A given file may only refer to one parent.
This process is called
.Em uniquification
because it ensures each child only has type information that is
unique to it.
A common example of this is that most kernel modules in
illumos are uniquified against the kernel module
.Sy genunix
and the type information that comes from the
.Sy IP
module.
This means that a module only has types that are unique to
itself and the most common types in the kernel are not duplicated.
.Sh FILE FORMAT
This documents version
.Em two
of the
.Nm
file format.
All applications and tools currently produce and operate on
this version.
.Lp
The file format can be summarized with the following image, the
following sections will cover this in more detail.
.Bd -literal

         +-------------+  0t0
+--------| Preamble    |
|        +-------------+  0t4
|+-------| Header      |
||       +-------------+  0t36 + cth_lbloff
||+------| Labels      |
|||      +-------------+  0t36 + cth_objtoff
|||+-----| Objects     |
||||     +-------------+  0t36 + cth_funcoff
||||+----| Functions   |
|||||    +-------------+  0t36 + cth_typeoff
|||||+---| Types       |
||||||   +-------------+  0t36 + cth_stroff
||||||+--| Strings     |
|||||||  +-------------+  0t36 + cth_stroff + cth_strlen
|||||||
|||||||
|||||||
|||||||    +-- magic -   vers   flags
|||||||    |          |    |      |
|||||||   +------+------+------+------+
+---------| 0xcf | 0xf1 | 0x02 | 0x00 |
 ||||||   +------+------+------+------+
 ||||||   0      1      2      3      4
 ||||||
 ||||||    + parent label        + objects
 ||||||    |       + parent name |     + functions    + strings
 ||||||    |       |     + label |     |      + types |       + strlen
 ||||||    |       |     |       |     |      |       |       |
 ||||||   +------+------+------+------+------+-------+-------+-------+
 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 |
  |||||   +------+------+------+------+------+-------+-------+-------+
  |||||   0x04   0x08   0x0c   0x10   0x14    0x18    0x1c    0x20   0x24
  |||||
  |||||         + Label name
  |||||         |       + Label type
  |||||         |       |       + Next label
  |||||         |       |       |
  |||||       +-------+------+-----+
  +-----------| 0x01  | 0x42 | ... |
   ||||       +-------+------+-----+
   ||||  cth_lbloff   +0x4   +0x8  cth_objtoff
   ||||
   ||||
   |||| Symidx  0t15   0t43   0t44
   ||||       +------+------+------+-----+
   +----------| 0x00 | 0x42 | 0x36 | ... |
    |||       +------+------+------+-----+
    ||| cth_objtoff  +0x2   +0x4   +0x6   cth_funcoff
    |||
    |||        + CTF_TYPE_INFO         + CTF_TYPE_INFO
    |||        |        + Return type  |
    |||        |        |       + arg0 |
    |||       +--------+------+------+-----+
    +---------| 0x2c10 | 0x08 | 0x0c | ... |
     ||       +--------+------+------+-----+
     || cth_funcff     +0x2   +0x4   +0x6  cth_typeoff
     ||
     ||         + ctf_stype_t for type 1
     ||         |  integer           + integer encoding
     ||         |                    |          + ctf_stype_t for type 2
     ||         |                    |          |
     ||       +--------------------+-----------+-----+
     +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... |
      |       +--------------------+-----------+-----+
      | cth_typeoff               +0x08      +0x0c  cth_stroff
      |
      |     +--- str 0
      |     |    +--- str 1       + str 2
      |     |    |                |
      |     v    v                v
      |   +----+---+---+---+----+---+---+---+---+---+----+
      +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 |
          +----+---+---+---+----+---+---+---+---+---+----+
          0    1   2   3   4    5   6   7   8   9   10   11
.Ed
.Lp
Every
.Nm
file begins with a
.Sy preamble ,
followed by a
.Sy header .
The
.Sy preamble
is defined as follows:
.Bd -literal
typedef struct ctf_preamble {
	ushort_t ctp_magic;	/* magic number (CTF_MAGIC) */
	uchar_t ctp_version;	/* data format version number (CTF_VERSION) */
	uchar_t ctp_flags;	/* flags (see below) */
} ctf_preamble_t;
.Ed
.Pp
The
.Sy preamble
is four bytes long and must be four byte aligned.
This
.Sy preamble
defines the version of the
.Nm
file which defines the format of the rest of the header.
While the
header may change in subsequent versions, the preamble will not change
across versions, though the interpretation of its flags may change from
version to version.
The
.Em ctp_magic
member defines the magic number for the
.Nm
file format.
This must always be
.Li 0xcff1 .
If another value is encountered, then the file should not be treated as
a
.Nm
file.
The
.Em ctp_version
member defines the version of the
.Nm
file.
The current version is
.Li 2 .
It is possible to encounter an unsupported version.
In that case,
software should not try to parse the format, as it may have changed.
Finally, the
.Em ctp_flags
member describes aspects of the file which modify its interpretation.
The following flags are currently defined:
.Bd -literal
#define	CTF_F_COMPRESS		0x01
.Ed
.Pp
The flag
.Sy CTF_F_COMPRESS
indicates that the body of the
.Nm
file, all the data following the
.Sy header ,
has been compressed through the
.Sy zlib
library and its
.Sy deflate
algorithm.
If this flag is not present, then the body has not been
compressed and no special action is needed to interpret it.
All offsets
into the data as described by
.Sy header ,
always refer to the
.Sy uncompressed
data.
.Lp
In version two of the
.Nm
file format, the
.Sy header
denotes whether whether or not this
.Nm
file is the child of another
.Nm
file and also indicates the size of the remaining sections.
The
structure for the
.Sy header ,
logically contains a copy of the
.Sy preamble
and the two have a combined size of 36 bytes.
.Bd -literal
typedef struct ctf_header {
	ctf_preamble_t cth_preamble;
	uint_t cth_parlabel;	/* ref to name of parent lbl uniq'd against */
	uint_t cth_parname;	/* ref to basename of parent */
	uint_t cth_lbloff;	/* offset of label section */
	uint_t cth_objtoff;	/* offset of object section */
	uint_t cth_funcoff;	/* offset of function section */
	uint_t cth_typeoff;	/* offset of type section */
	uint_t cth_stroff;	/* offset of string section */
	uint_t cth_strlen;	/* length of string section in bytes */
} ctf_header_t;
.Ed
.Pp
After the
.Sy preamble ,
the next two members
.Em cth_parlablel
and
.Em cth_parname ,
are used to identify the parent.
The value of both members are offsets
into the
.Sy string
section which point to the start of a null-terminated string.
For more
information on the encoding of strings, see the subsection on
.Sx String Identifiers .
If the value of either is zero, then there is no entry for that
member.
If the member
.Em cth_parlabel
is set, then the
.Em ctf_parname
member must be set, otherwise it will not be possible to find the
parent.
If
.Em ctf_parname
is set, it is not necessary to define
.Em cth_parlabel ,
as the parent may not have a label.
For more information on labels
and their interpretation, see
.Sx The Label Section .
.Lp
The remaining members (excepting
.Em cth_strlen )
describe the beginning of the corresponding sections.
These offsets are
relative to the end of the
.Sy header .
Therefore, something with an offset of 0 is at an offset of thirty-six
bytes relative to the start of the
.Nm
file.
The difference between members
indicates the size of the section itself.
Different offsets have
different alignment requirements.
The start of the
.Em cth_objotoff
and
.Em cth_funcoff
must be two byte aligned, while the sections
.Em cth_lbloff
and
.Em cth_typeoff
must be four-byte aligned.
The section
.Em cth_stroff
has no alignment requirements.
To calculate the size of a given section,
excepting the
.Sy string
section, one should subtract the offset of the section from the following one.
For
example, the size of the
.Sy types
section can be calculated by subtracting
.Em cth_stroff
from
.Em cth_typeoff .
.Lp
Finally, the member
.Em cth_strlen
describes the length of the string section itself.
From it, you can also
calculate the size of the entire
.Nm
file by adding together the size of the
.Sy ctf_header_t ,
the offset of the string section in
.Em cth_stroff ,
and the size of the string section in
.Em cth_srlen .
.Ss Type Identifiers
Through the
.Nm ctf
data, types are referred to by identifiers.
A given
.Nm
file supports up to 32767 (0x7fff) types.
The first valid type identifier is 0x1.
When a given
.Nm
file is a child, indicated by a non-zero entry for the
.Sy header Ns 's
.Em cth_parname ,
then the first valid type identifier is 0x8000 and the last is 0xffff.
In this case, type identifiers 0x1 through 0x7fff are references to the
parent.
.Lp
The type identifier zero is a sentinel value used to indicate that there
is no type information available or it is an unknown type.
.Lp
Throughout the file format, the identifier is stored in different sized
values; however, the minimum size to represent a given identifier is a
.Sy uint16_t .
Other consumers of
.Nm
information may use larger or opaque identifiers.
.Ss String Identifiers
String identifiers are always encoded as four byte unsigned integers
which are an offset into a string table.
The
.Nm
format supports two different string tables which have an identifier of
zero or one.
This identifier is stored in the high-order bit of the
unsigned four byte offset.
Therefore, the maximum supported offset into
one of these tables is 0x7ffffffff.
.Lp
Table identifier zero, always refers to the
.Sy string
section in the CTF file itself.
String table identifier one refers to an
external string table which is the ELF string table for the ELF symbol
table associated with the
.Nm
container.
.Ss Type Encoding
Every
.Nm
type begins with metadata encoded into a
.Sy uint16_t .
This encoded information tells us three different pieces of information:
.Bl -bullet -offset indent -compact
.It
The kind of the type
.It
Whether this type is a root type or not
.It
The length of the variable data
.El
.Lp
The 16 bits that make up the encoding are broken down such that you have
five bits for the kind, one bit for indicating whether or not it is a
root type, and 10 bits for the variable length.
This is laid out as
follows:
.Bd -literal -offset indent
+--------------------+
| kind | root | vlen |
+--------------------+
15   11   10   9    0
.Ed
.Lp
The current version of the file format defines 14 different kinds.
The
interpretation of these different kinds will be discussed in the section
.Sx The Type Section .
If a kind is encountered that is not listed below, then it is not a valid
.Nm
file.
The kinds are defined as follows:
.Bd -literal -offset indent
#define	CTF_K_UNKNOWN	0
#define	CTF_K_INTEGER	1
#define	CTF_K_FLOAT	2
#define	CTF_K_POINTER	3
#define	CTF_K_ARRAY	4
#define	CTF_K_FUNCTION	5
#define	CTF_K_STRUCT	6
#define	CTF_K_UNION	7
#define	CTF_K_ENUM	8
#define	CTF_K_FORWARD	9
#define	CTF_K_TYPEDEF	10
#define	CTF_K_VOLATILE	11
#define	CTF_K_CONST	12
#define	CTF_K_RESTRICT	13
.Ed
.Lp
Programs directly reference many types; however, other types are referenced
indirectly because they are part of some other structure.
These types that are
referenced directly and used are called
.Sy root
types.
Other types may be used indirectly, for example, a program may reference
a structure directly, but not one of its members which has a type.
That type is
not considered a
.Sy root
type.
If a type is a
.Sy root
type, then it will have bit 10 set.
.Lp
The variable length section is specific to each kind and is discussed in the
section
.Sx The Type Section .
.Lp
The following macros are useful for constructing and deconstructing the encoded
type information:
.Bd -literal -offset indent

#define	CTF_MAX_VLEN	0x3ff
#define	CTF_INFO_KIND(info)	(((info) & 0xf800) >> 11)
#define	CTF_INFO_ISROOT(info)	(((info) & 0x0400) >> 10)
#define	CTF_INFO_VLEN(info)	(((info) & CTF_MAX_VLEN))

#define	CTF_TYPE_INFO(kind, isroot, vlen) \\
	(((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN))
.Ed
.Ss The Label Section
When consuming
.Nm
data, it is often useful to know whether two different
.Nm
containers come from the same source base and version.
For example, when
building illumos, there are many kernel modules that are built against a
single collection of source code.
A label is encoded into the
.Nm
files that corresponds with the particular build.
This ensures that if
files on the system were to become mixed up from multiple releases, that
they are not used together by tools, particularly when a child needs to
refer to a type in the parent.
Because they are linked used the type
identifiers, if the wrong parent is used then the wrong type will be
encountered.
.Lp
Each label is encoded in the file format using the following eight byte
structure:
.Bd -literal
typedef struct ctf_lblent {
	uint_t ctl_label;	/* ref to name of label */
	uint_t ctl_typeidx;	/* last type associated with this label */
} ctf_lblent_t;
.Ed
.Lp
Each label has two different components, a name and a type identifier.
The name is encoded in the
.Em ctl_label
member which is in the format defined in the section
.Sx String Identifiers .
Generally, the names of all labels are found in the internal string
section.
.Lp
The type identifier encoded in the member
.Em ctl_typeidx
refers to the last type identifier that a label refers to in the current
file.
Labels only refer to types in the current file, if the
.Nm
file is a child, then it will have the same label as its parent;
however, its label will only refer to its types, not its parents.
.Lp
It is also possible, though rather uncommon, for a
.Nm
file to have multiple labels.
Labels are placed one after another, every
eight bytes.
When multiple labels are present, types may only belong to
a single label.
.Ss The Object Section
The object section provides a mapping from ELF symbols of type
.Sy STT_OBJECT
in the symbol table to a type identifier.
Every entry in this section is
a
.Sy uint16_t
which contains a type identifier as described in the section
.Sx Type Identifiers .
If there is no information for an object, then the type identifier 0x0
is stored for that entry.
.Lp
To walk the object section, you need to have a corresponding
.Sy symbol table
in the ELF object that contains the
.Nm
data.
Not every object is included in this section.
Specifically, when
walking the symbol table.
An entry is skipped if it matches any of the
following conditions:
.Lp
.Bl -bullet -offset indent -compact
.It
The symbol type is not
.Sy STT_OBJECT
.It
The symbol's section index is
.Sy SHN_UNDEF
.It
The symbol's name offset is zero
.It
The symbol's section index is
.Sy SHN_ABS
and the value of the symbol is zero.
.It
The symbol's name is
.Li _START_
or
.Li _END_ .
These are skipped because they are used for scoping local symbols in
ELF.
.El
.Lp
The following sample code shows an example of iterating the object
section and skipping the correct symbols:
.Bd -literal
#include <gelf.h>
#include <stdio.h>

/*
 * Given the start of the object section in the CTF file, the number of symbols,
 * and the ELF Data sections for the symbol table and the string table, this
 * prints the type identifiers that correspond to objects. Note, a more robust
 * implementation should ensure that they don't walk beyond the end of the CTF
 * object section.
 */
static int
walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata,
    long nsyms)
{
	long i;
	uintptr_t strbase = strdata->d_buf;

	for (i = 1; i < nsyms; i++, objftoff++) {
		const char *name;
		GElf_Sym sym;

		if (gelf_getsym(symdata, i, &sym) == NULL)
			return (1);

		if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
			continue;
		if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0)
			continue;
		if (sym.st_shndx == SHN_ABS && sym.st_value == 0)
			continue;
		name = (const char *)(strbase + sym.st_name);
		if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0)
			continue;

		(void) printf("Symbol %d has type %d\n", i, *objtoff);
	}

	return (0);
}
.Ed
.Ss The Function Section
The function section of the
.Nm
file encodes the types of both the function's arguments and the function's
return type.
Similar to
.Sx The Object Section ,
the function section encodes information for all symbols of type
.Sy STT_FUNCTION ,
excepting those that fit specific criteria.
Unlike with objects, because
functions have a variable number of arguments, they start with a type encoding
as defined in
.Sx Type Encoding ,
which is the size of a
.Sy uint16_t .
For functions which have no type information available, they are encoded as
.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) .
Functions with arguments are encoded differently.
Here, the variable length is
turned into the number of arguments in the function.
If a function is a
.Sy varargs
type function, then the number of arguments is increased by one.
Functions with
type information are encoded as:
.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) .
.Lp
For functions that have no type information, nothing else is encoded, and the
next function is encoded.
For functions with type information, the next
.Sy uint16_t
is encoded with the type identifier of the return type of the function.
It is
followed by each of the type identifiers of the arguments, if any exist, in the
order that they appear in the function.
Therefore, argument 0 is the first type
identifier and so on.
When a function has a final varargs argument, that is
encoded with the type identifier of zero.
.Lp
Like
.Sx The Object Section ,
the function section is encoded in the order of the symbol table.
It has
similar, but slightly different considerations from objects.
While iterating the
symbol table, if any of the following conditions are true, then the entry is
skipped and no corresponding entry is written:
.Lp
.Bl -bullet -offset indent -compact
.It
The symbol type is not
.Sy STT_FUNCTION
.It
The symbol's section index is
.Sy SHN_UNDEF
.It
The symbol's name offset is zero
.It
The symbol's name is
.Li _START_
or
.Li _END_ .
These are skipped because they are used for scoping local symbols in
ELF.
.El
.Ss The Type Section
The type section is the heart of the
.Nm
data.
It encodes all of the information about the types themselves.
The base of
the type information comes in two forms, a short form and a long form, each of
which may be followed by a variable number of arguments.
The following
definitions describe the short and long forms:
.Bd -literal
#define	CTF_MAX_SIZE	0xfffe	/* max size of a type in bytes */
#define	CTF_LSIZE_SENT	0xffff	/* sentinel for ctt_size */
#define	CTF_MAX_LSIZE	UINT64_MAX

typedef struct ctf_stype {
	uint_t ctt_name;	/* reference to name in string table */
	ushort_t ctt_info;	/* encoded kind, variant length */
	union {
		ushort_t _size;	/* size of entire type in bytes */
		ushort_t _type;	/* reference to another type */
	} _u;
} ctf_stype_t;

typedef struct ctf_type {
	uint_t ctt_name;	/* reference to name in string table */
	ushort_t ctt_info;	/* encoded kind, variant length */
	union {
		ushort_t _size;	/* always CTF_LSIZE_SENT */
		ushort_t _type; /* do not use */
	} _u;
	uint_t ctt_lsizehi;	/* high 32 bits of type size in bytes */
	uint_t ctt_lsizelo;	/* low 32 bits of type size in bytes */
} ctf_type_t;

#define	ctt_size _u._size	/* for fundamental types that have a size */
#define	ctt_type _u._type	/* for types that reference another type */
.Ed
.Pp
Type sizes are stored in
.Sy bytes .
The basic small form uses a
.Sy ushort_t
to store the number of bytes.
If the number of bytes in a structure would exceed
0xfffe, then the alternate form, the
.Sy ctf_type_t ,
is used instead.
To indicate that the larger form is being used, the member
.Em ctt_size
is set to value of
.Sy CTF_LSIZE_SENT
(0xffff).
In general, when going through the type section, consumers use the
.Sy ctf_type_t
structure, but pay attention to the value of the member
.Em ctt_size
to determine whether they should increment their scan by the size of the
.Sy ctf_stype_t
or
.Sy ctf_type_t .
Not all kinds of types use
.Sy ctt_size .
Those which do not, will always use the
.Sy ctf_stype_t
structure.
The individual sections for each kind have more information.
.Lp
Types are written out in order.
Therefore the first entry encountered has a type
id of 0x1, or 0x8000 if a child.
The member
.Em ctt_name
is encoded as described in the section
.Sx String Identifiers .
The string that it points to is the name of the type.
If the identifier points
to an empty string (one that consists solely of a null terminator) then the type
does not have a name, this is common with anonymous structures and unions that
only have a typedef to name them, as well as, pointers and qualifiers.
.Lp
The next member, the
.Em ctt_info ,
is encoded as described in the section
.Sx Type Encoding .
The types kind tells us how to interpret the remaining data in the
.Sy ctf_type_t
and any variable length data that may exist.
The rest of this section will be
broken down into the interpretation of the various kinds.
.Ss Encoding of Integers
Integers, which are of type
.Sy CTF_K_INTEGER ,
have no variable length arguments.
Instead, they are followed by a four byte
.Sy uint_t
which describes their encoding.
All integers must be encoded with a variable
length of zero.
The
.Em ctt_size
member describes the length of the integer in bytes.
In general, integer sizes
will be rounded up to the closest power of two.
.Lp
The integer encoding contains three different pieces of information:
.Bl -bullet -offset indent -compact
.It
The encoding of the integer
.It
The offset in
.Sy bits
of the type
.It
The size in
.Sy bits
of the type
.El
.Pp
This encoding can be expressed through the following macros:
.Bd -literal -offset indent
#define	CTF_INT_ENCODING(data)	(((data) & 0xff000000) >> 24)
#define	CTF_INT_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
#define	CTF_INT_BITS(data)	(((data) & 0x0000ffff))

#define	CTF_INT_DATA(encoding, offset, bits) \\
	(((encoding) << 24) | ((offset) << 16) | (bits))
.Ed
.Pp
The following flags are defined for the encoding at this time:
.Bd -literal -offset indent
#define	CTF_INT_SIGNED		0x01
#define	CTF_INT_CHAR		0x02
#define	CTF_INT_BOOL		0x04
#define	CTF_INT_VARARGS		0x08
.Ed
.Lp
By default, an integer is considered to be unsigned, unless it has the
.Sy CTF_INT_SIGNED
flag set.
If the flag
.Sy CTF_INT_CHAR
is set, that indicates that the integer is of a type that stores character
data, for example the intrinsic C type
.Sy char
would have the
.Sy CTF_INT_CHAR
flag set.
If the flag
.Sy CTF_INT_BOOL
is set, that indicates that the integer represents a boolean type.
For example,
the intrinsic C type
.Sy _Bool
would have the
.Sy CTF_INT_BOOL
flag set.
Finally, the flag
.Sy CTF_INT_VARARGS
indicates that the integer is used as part of a variable number of arguments.
This encoding is rather uncommon.
.Ss Encoding of Floats
Floats, which are of type
.Sy CTF_K_FLOAT ,
are similar to their integer counterparts.
They have no variable length
arguments and are followed by a four byte encoding which describes the kind of
float that exists.
The
.Em ctt_size
member is the size, in bytes, of the float.
The float encoding has three
different pieces of information inside of it:
.Lp
.Bl -bullet -offset indent -compact
.It
The specific kind of float that exists
.It
The offset in
.Sy bits
of the float
.It
The size in
.Sy bits
of the float
.El
.Lp
This encoding can be expressed through the following macros:
.Bd -literal -offset indent
#define	CTF_FP_ENCODING(data)	(((data) & 0xff000000) >> 24)
#define	CTF_FP_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
#define	CTF_FP_BITS(data)	(((data) & 0x0000ffff))

#define	CTF_FP_DATA(encoding, offset, bits) \\
	(((encoding) << 24) | ((offset) << 16) | (bits))
.Ed
.Lp
Where as the encoding for integers was a series of flags, the encoding for
floats maps to a specific kind of float.
It is not a flag-based value.
The kinds of floats
correspond to both their size, and the encoding.
This covers all of the basic C
intrinsic floating point types.
The following are the different kinds of floats
represented in the encoding:
.Bd -literal -offset indent
#define	CTF_FP_SINGLE	1	/* IEEE 32-bit float encoding */
#define	CTF_FP_DOUBLE	2	/* IEEE 64-bit float encoding */
#define	CTF_FP_CPLX	3	/* Complex encoding */
#define	CTF_FP_DCPLX	4	/* Double complex encoding */
#define	CTF_FP_LDCPLX	5	/* Long double complex encoding */
#define	CTF_FP_LDOUBLE	6	/* Long double encoding */
#define	CTF_FP_INTRVL	7	/* Interval (2x32-bit) encoding */
#define	CTF_FP_DINTRVL	8	/* Double interval (2x64-bit) encoding */
#define	CTF_FP_LDINTRVL	9	/* Long double interval (2x128-bit) encoding */
#define	CTF_FP_IMAGRY	10	/* Imaginary (32-bit) encoding */
#define	CTF_FP_DIMAGRY	11	/* Long imaginary (64-bit) encoding */
#define	CTF_FP_LDIMAGRY	12	/* Long double imaginary (128-bit) encoding */
.Ed
.Ss Encoding of Arrays
Arrays, which are of type
.Sy CTF_K_ARRAY ,
have no variable length arguments.
They are followed by a structure which
describes the number of elements in the array, the type identifier of the
elements in the array, and the type identifier of the index of the array.
With
arrays, the
.Em ctt_size
member is set to zero.
The structure that follows an array is defined as:
.Bd -literal
typedef struct ctf_array {
	ushort_t cta_contents;	/* reference to type of array contents */
	ushort_t cta_index;	/* reference to type of array index */
	uint_t cta_nelems;	/* number of elements */
} ctf_array_t;
.Ed
.Lp
The
.Em cta_contents
and
.Em cta_index
members of the
.Sy ctf_array_t
are type identifiers which are encoded as per the section
.Sx Type Identifiers .
The member
.Em cta_nelems
is a simple four byte unsigned count of the number of elements.
This count may
be zero when encountering C99's flexible array members.
.Ss Encoding of Functions
Function types, which are of type
.Sy CTF_K_FUNCTION ,
use the variable length list to be the number of arguments in the function.
When
the function has a final member which is a varargs, then the argument count is
incremented by one to account for the variable argument.
Here, the
.Em ctt_type
member is encoded with the type identifier of the return type of the function.
Note that the
.Em ctt_size
member is not used here.
.Lp
The variable argument list contains the type identifiers for the arguments of
the function, if any.
Each one is represented by a
.Sy uint16_t
and encoded according to the
.Sx Type Identifiers
section.
If the function's last argument is of type varargs, then it is also
written out, but the type identifier is zero.
This is included in the count of
the function's arguments.
.Ss Encoding of Structures and Unions
Structures and Unions, which are encoded with
.Sy CTF_K_STRUCT
and
.Sy CTF_K_UNION
respectively,  are very similar constructs in C.
The main difference
between them is the fact that every member of a structure follows one another,
where as in a union, all members share the same memory.
They are also very
similar in terms of their encoding in
.Nm .
The variable length argument for structures and unions represents the number of
members that they have.
The value of the member
.Em ctt_size
is the size of the structure and union.
There are two different structures which
are used to encode members in the variable list.
When the size of a structure or
union is greater than or equal to the large member threshold, 8192, then a
different structure is used to encode the member, all members are encoded using
the same structure.
The structure for members is as follows:
.Bd -literal
typedef struct ctf_member {
	uint_t ctm_name;	/* reference to name in string table */
	ushort_t ctm_type;	/* reference to type of member */
	ushort_t ctm_offset;	/* offset of this member in bits */
} ctf_member_t;

typedef struct ctf_lmember {
	uint_t ctlm_name;	/* reference to name in string table */
	ushort_t ctlm_type;	/* reference to type of member */
	ushort_t ctlm_pad;	/* padding */
	uint_t ctlm_offsethi;	/* high 32 bits of member offset in bits */
	uint_t ctlm_offsetlo;	/* low 32 bits of member offset in bits */
} ctf_lmember_t;
.Ed
.Lp
Both the
.Em ctm_name
and
.Em ctlm_name
refer to the name of the member.
The name is encoded as an offset into the
string table as described by the section
.Sx String Identifiers .
The members
.Sy ctm_type
and
.Sy ctlm_type
both refer to the type of the member.
They are encoded as per the section
.Sx Type Identifiers .
.Lp
The last piece of information that is present is the offset which describes the
offset in memory that the member begins at.
For unions, this value will always
be zero because the start of unions in memory is always zero.
For structures,
this is the offset in
.Sy bits
that the member begins at.
Note that a compiler may lay out a type with padding.
This means that the difference in offset between two consecutive members may be
larger than the size of the member.
When the size of the overall structure is
strictly less than 8192 bytes, the normal structure,
.Sy ctf_member_t ,
is used and the offset in bits is stored in the member
.Em ctm_offset .
However, when the size of the structure is greater than or equal to 8192 bytes,
then the number of bits is split into two 32-bit quantities.
One member,
.Em ctlm_offsethi ,
represents the upper 32 bits of the offset, while the other member,
.Em ctlm_offsetlo ,
represents the lower 32 bits of the offset.
These can be joined together to get
a 64-bit sized offset in bits by shifting the member
.Em ctlm_offsethi
to the left by thirty two and then doing a binary or of
.Em ctlm_offsetlo .
.Ss Encoding of Enumerations
Enumerations, noted by the type
.Sy CTF_K_ENUM ,
are similar to structures.
Enumerations use the variable list to note the number
of values that the enumeration contains, which we'll term enumerators.
In C, an
enumeration is always equivalent to the intrinsic type
.Sy int ,
thus the value of the member
.Em ctt_size
is always the size of an integer which is determined based on the current model.
For illumos systems, this will always be 4, as an integer is always defined to
be 4 bytes large in both
.Sy ILP32
and
.Sy LP64 ,
regardless of the architecture.
.Lp
The enumerators encoded in an enumeration have the following structure in the
variable list:
.Bd -literal
typedef struct ctf_enum {
	uint_t cte_name;	/* reference to name in string table */
	int cte_value;		/* value associated with this name */
} ctf_enum_t;
.Ed
.Pp
The member
.Em cte_name
refers to the name of the enumerator's value, it is encoded according to the
rules in the section
.Sx String Identifiers .
The member
.Em cte_value
contains the integer value of this enumerator.
.Ss Encoding of Forward References
Forward references, types of kind
.Sy CTF_K_FORWARD ,
in a
.Nm
file refer to types which may not have a definition at all, only a name.
If
the
.Nm
file is a child, then it may be that the forward is resolved to an
actual type in the parent, otherwise the definition may be in another
.Nm
container or may not be known at all.
The only member of the
.Sy ctf_type_t
that matters for a forward declaration is the
.Em ctt_name
which points to the name of the forward reference in the string table as
described earlier.
There is no other information recorded for forward
references.
.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict
Pointers, typedefs, volatile, const, and restrict are all similar in
.Nm .
They all refer to another type.
In the case of typedefs, they provide an
alternate name, while volatile, const, and restrict change how the type is
interpreted in the C programming language.
This covers the
.Nm
kinds
.Sy CTF_K_POINTER ,
.Sy CTF_K_TYPEDEF ,
.Sy CTF_K_VOLATILE ,
.Sy CTF_K_RESTRICT ,
and
.Sy CTF_K_CONST .
.Lp
These types have no variable list entries and use the member
.Em ctt_type
to refer to the base type that they modify.
.Ss Encoding of Unknown Types
Types with the kind
.Sy CTF_K_UNKNOWN
are used to indicate gaps in the type identifier space.
These entries consume an
identifier, but do not define anything.
Nothing should refer to these gap
identifiers.
.Ss Dependencies Between Types
C types can be imagined as a directed, cyclic, graph.
Structures and unions may
refer to each other in a way that creates a cyclic dependency.
In cases such as
these, the entire type section must be read in and processed.
Consumers must
not assume that every type can be laid out in dependency order; they
cannot.
.Ss The String Section
The last section of the
.Nm
file is the
.Sy string
section.
This section encodes all of the strings that appear throughout
the other sections.
It is laid out as a series of characters followed by
a null terminator.
Generally, all names are written out in ASCII, as
most C compilers do not allow and characters to appear in identifiers
outside of a subset of ASCII.
However, any extended characters sets
should be written out as a series of UTF-8 bytes.
.Lp
The first entry in the section, at offset zero, is a single null
terminator to reference the empty string.
Following that, each C string
should be written out, including the null terminator.
Offsets that refer
to something in this section should refer to the first byte which begins
a string.
Beyond the first byte in the section being the null
terminator, the order of strings is unimportant.
.Ss Data Encoding and ELF Considerations
.Nm
data is generally included in ELF objects which specify information to
identify the architecture and endianness of the file.
A
.Nm
container inside such an object must match the endianness of the ELF
object.
Aside from the question of the endian encoding of data, there
should be no other differences between architectures.
While many of the
types in this document refer to non-fixed size C integral types, they
are equivalent in the models
.Sy ILP32
and
.Sy LP64 .
If any other model is being used with
.Nm
data that has different sizes, then it must not use the model's sizes for
those integral types and instead use the fixed size equivalents based on an
.Sy ILP32
environment.
.Lp
When placing a
.Nm
container inside of an ELF object, there are certain conventions that are
expected for the purposes of tooling being able to find the
.Nm
data.
In particular, a given ELF object should only contain a single
.Nm
section.
Multiple containers should be merged together into a single
one.
.Lp
The
.Nm
file should be included in its own ELF section.
The section's name
must be
.Ql .SUNW_ctf .
The type of the section must be
.Sy SHT_PROGBITS .
The section should have a link set to the symbol table and its address
alignment must be 4.
.Sh SEE ALSO
.Xr dtrace 1 ,
.Xr elf 3 ,
.Xr gelf 3 ,
.Xr a.out 5 ,
.Xr elf 5