The following text is a brief overview of those key
principles which are useful to know when generating code
with SLJIT. Further details can be found in sljitLir.h.
----------------------------------------------------------------
What is SLJIT?
----------------------------------------------------------------
SLJIT is a platform independent assembler which
- provides access to common CPU features
- can be easily ported to wide-spread CPU
architectures (e.g. x86, ARM, POWER, MIPS, SPARC)
The key challenge of this project is finding a common
subset of CPU features which
- covers traditional assembly level programming
- can be translated to machine code efficiently
This aim is achieved by selecting those instructions / CPU
features which are either available on all platforms or
simulating them has a low performance overhead.
For example, some SLJIT instructions support base register
pre-update when [base+offs] memory accessing mode is used.
Although this feature is only available on ARM and POWER
CPUs, the simulation overhead is low on other CPUs.
----------------------------------------------------------------
The generic CPU model of SLJIT
----------------------------------------------------------------
The CPU has
- integer registers, which can store either an
int32_t (4 byte) or intptr_t (4 or 8 byte) value
- floating point registers, which can store either a
single (4 byte) or double (8 byte) precision value
- boolean status flags
*** Integer registers:
The most important rule is: when a source operand of
an instruction is a register, the data type of the
register must match the data type expected by an
instruction.
For example, the following code snippet
is a valid instruction sequence:
sljit_emit_op1(compiler, SLJIT_IMOV,
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0);
// An int32_t value is loaded into SLJIT_R0
sljit_emit_op1(compiler, SLJIT_INEG,
SLJIT_R0, 0, SLJIT_R0, 0);
// the int32_t value in SLJIT_R0 is negated
// and the type of the result is still int32_t
The next code snippet is not allowed:
sljit_emit_op1(compiler, SLJIT_MOV,
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0);
// An intptr_t value is loaded into SLJIT_R0
sljit_emit_op1(compiler, SLJIT_INEG,
SLJIT_R0, 0, SLJIT_R0, 0);
// The result of SLJIT_INEG instruction
// is undefined. Even crash is possible
// (e.g. on MIPS-64).
However, it is always allowed to overwrite a
register regardless its previous value:
sljit_emit_op1(compiler, SLJIT_MOV,
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0);
// An intptr_t value is loaded into SLJIT_R0
sljit_emit_op1(compiler, SLJIT_IMOV,
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R2), 0);
// From now on SLJIT_R0 contains an int32_t
// value. The previous value is discarded.
Type conversion instructions are provided to convert an
int32_t value to an intptr_t value and vice versa. In
certain architectures these conversions are nops (no
instructions are emitted).
Memory accessing:
Registers arguments of SLJIT_MEM1 / SLJIT_MEM2 addressing
modes must contain intptr_t data.
Signed / unsigned values:
Most operations are executed in the same way regardless
the value is signed or unsigned. These operations have
only one instruction form (e.g. SLJIT_ADD / SLJIT_MUL).
Instructions where the result depends on the sign have
two forms (e.g. integer division, long multiply).
*** Floating point registers
Floating point registers can either contain a single
or double precision value. Similar to integer registers,
the data type of the value stored in a source register
must match the data type expected by the instruction.
Otherwise the result is undefined (even crash is possible).
Rounding:
Similar to standard C, floating point computation
results are rounded toward zero.
*** Boolean status flags:
Conditional branches usually depend on the value
of CPU status flags. These status flags are boolean
values and can be set by certain instructions.
To achive maximum efficiency and portability, the
following rules were introduced:
- Most instructions can freely modify these status
flags except if SLJIT_KEEP_FLAGS is passed.
- The SLJIT_KEEP_FLAGS option may have a performance
overhead, so it should only be used when necessary.
- The SLJIT_SET_E, SLJIT_SET_U, etc. options can
force an instruction to correctly set the
specified status flags. However, all other
status flags are undefined. This rule must
always be kept in mind!
- Status flags cannot be controlled directly
(there are no set/clear/invert operations)
The last two rules allows efficent mapping of status flags.
For example the arithmetic and multiply overflow flag is
mapped to the same overflow flag bit on x86. This is allowed,
since no instruction can set both of these flags. When
either of them is set by an instruction, the other can
have any value (this satisfies the "all other flags are
undefined" rule). Therefore mapping two SLJIT flags to the
same CPU flag is possible. Even though SLJIT supports
a dozen status flags, they can be efficiently mapped
to CPUs with only 4 status flags (e.g. ARM or SPARC).
----------------------------------------------------------------
Complex instructions
----------------------------------------------------------------
We noticed, that introducing complex instructions for common
tasks can improve performance. For example, compare and
branch instruction sequences can be optimized if certain
conditions apply, but these conditions depend on the target
CPU. SLJIT can do these optimizations, but it needs to
understand the "purpose" of the generated code. Static
instruction analysis has a large performance overhead
however, so we choose another approach: we introduced
complex instruction forms for certain non-atomic tasks.
SLJIT can optimize these "instructions" more efficiently
since the "purpose" is known to the compiler. These complex
instruction forms can often be assembled from other SLJIT
instructions, but we recommended to use them since the
compiler can optimize them on certain CPUs.
----------------------------------------------------------------
Generating functions
----------------------------------------------------------------
SLJIT is often used for generating function bodies which are
called from C. SLJIT provides two complex instructions for
generating function entry and return: sljit_emit_enter and
sljit_emit_return. The sljit_emit_enter also initializes the
"compiling context" which specify the current register mapping,
local space size, etc. configurations. The sljit_set_context
can also set this context without emitting any machine
instructions.
This context is important since it affects the compiler, so
the first instruction after a compiler is created must be
either sljit_emit_enter or sljit_set_context. The context can
be changed by calling sljit_emit_enter or sljit_set_context
again.
----------------------------------------------------------------
All-in-one building
----------------------------------------------------------------
Instead of using a separate library, the whole SLJIT
compiler infrastructure can be directly included:
#define SLJIT_CONFIG_STATIC 1
#include "sljitLir.c"
This approach is useful for single file compilers.
Advantages:
- Everything provided by SLJIT is available
(no need to include anything else).
- Configuring SLJIT is easy
(e.g. redefining SLJIT_MALLOC / SLJIT_FREE).
- The SLJIT compiler API is hidden from the
world which improves securtity.
- The C compiler can optimize the SLJIT code
generator (e.g. removing unused functions).
----------------------------------------------------------------
Types and macros
----------------------------------------------------------------
The sljitConfig.h contains those defines, which controls
the compiler. The beginning of sljitConfigInternal.h
lists architecture specific types and macros provided
by SLJIT. Some of these macros:
SLJIT_DEBUG : enabled by default
Enables assertions. Should be disabled in release mode.
SLJIT_VERBOSE : enabled by default
When this macro is enabled, the sljit_compiler_verbose
function can be used to dump SLJIT instructions.
Otherwise this function is not available. Should be
disabled in release mode.
SLJIT_SINGLE_THREADED : disabled by default
Single threaded programs can define this flag which
eliminates the pthread dependency.
sljit_sw, sljit_uw, etc. :
It is recommended to use these types instead of long,
intptr_t, etc. Improves readability / portability of
the code.