SSE Instructions
SSE instructions are an extension of the SIMD execution model introduced with the
MMX technology. SSE instructions are divided into four subgroups:
-
SIMD single-precision floating-point instructions that operate on the XMM registers
-
MXSCR state management instructions
-
64–bit SIMD integer instructions that operate on the MMX registers
-
Instructions that provide cache control, prefetch, and instruction ordering functionality
SIMD Single-Precision Floating-Point Instructions (SSE)
The SSE SIMD instructions operate on packed and scalar single-precision floating-point values located
in the XMM registers or memory.
Data Transfer Instructions (SSE)
The SSE data transfer instructions move packed and scalar single-precision floating-point operands between
XMM registers and between XMM registers and memory.
Table 3-27 Data Transfer Instructions (SSE)
|
|
|
|
|
MOVAPS |
move four
aligned packed single-precision floating-point values between XMM registers or memory |
|
|
MOVHLPS |
move two packed
single-precision floating-point values from the high quadword of an XMM register to the
low quadword of another XMM register |
|
|
MOVHPS |
move two packed single-precision floating-point values
to or from the high quadword of an XMM register or memory |
|
|
MOVLHPS |
move
two packed single-precision floating-point values from the low quadword of an XMM register
to the high quadword of another XMM register |
|
|
MOVLPS |
move two packed single-precision
floating-point values to or from the low quadword of an XMM register or
memory |
|
|
MOVMSKPS |
extract sign mask from four packed single-precision floating-point values |
|
|
MOVSS |
move scalar
single-precision floating-point value between XMM registers or memory |
|
|
MOVUPS |
move four unaligned packed single-precision
floating-point values between XMM registers or memory |
|
|
Packed Arithmetic Instructions (SSE)
SSE packed arithmetic instructions perform packed and scalar arithmetic operations on packed and scalar
single-precision floating-point operands.
Table 3-28 Packed Arithmetic Instructions (SSE)
|
|
|
|
|
ADDPS |
add packed single-precision floating-point values |
|
|
ADDSS |
add
scalar single-precision floating-point values |
|
|
DIVPS |
divide packed single-precision floating-point values |
|
|
DIVSS |
divide scalar single-precision
floating-point values |
|
|
MAXPS |
return maximum packed single-precision floating-point values |
|
|
MAXSS |
return maximum scalar single-precision
floating-point values |
|
|
MINPS |
return minimum packed single-precision floating-point values |
|
|
MINSS |
return minimum scalar single-precision
floating-point values. |
|
|
MULPS |
multiply packed single-precision floating-point values |
|
|
MULSS |
multiply scalar single-precision floating-point values |
|
|
RCPPS |
compute reciprocals of packed single-precision floating-point values |
|
|
RCPSS |
compute reciprocal of scalar single-precision
floating-point values |
|
|
RSQRTPS |
compute reciprocals of square roots of packed single-precision floating-point values |
|
|
RSQRTSS |
compute
reciprocal of square root of scalar single-precision floating-point values |
|
|
SQRTPS |
compute square roots of
packed single-precision floating-point values |
|
|
SQRTSS |
compute square root of scalar single-precision floating-point values |
|
|
SUBPS |
subtract packed single-precision floating-point values |
|
|
SUBSS |
subtract scalar single-precision floating-point values |
|
|
Comparison Instructions (SSE)
The SEE compare instructions compare packed and scalar single-precision floating-point operands.
Table 3-29 Comparison Instructions (SSE)
|
|
|
|
|
CMPPS |
compare packed single-precision floating-point values |
|
|
CMPSS |
compare scalar single-precision floating-point values |
|
|
COMISS |
perform
ordered comparison of scalar single-precision floating-point values and set flags in EFLAGS register |
|
|
UCOMISS |
perform unordered comparison of scalar single-precision floating-point values and set flags in EFLAGS
register |
|
|
Logical Instructions (SSE)
The SSE logical instructions perform bitwise AND, AND NOT, OR, and XOR operations
on packed single-precision floating-point operands.
Table 3-30 Logical Instructions (SSE)
|
|
|
|
|
ANDNPS |
perform bitwise logical AND NOT
of packed single-precision floating-point values |
|
|
ANDPS |
perform bitwise logical AND of packed single-precision floating-point values |
|
|
ORPS |
perform bitwise logical OR of packed single-precision floating-point values |
|
|
XORPS |
perform bitwise logical XOR
of packed single-precision floating-point values |
|
|
Shuffle and Unpack Instructions (SSE)
The SSE shuffle and unpack instructions shuffle or interleave single-precision floating-point values in packed
single-precision floating-point operands.
Table 3-31 Shuffle and Unpack Instructions (SSE)
|
|
|
|
|
SHUFPS |
shuffles values in packed single-precision floating-point
operands |
|
|
UNPCKHPS |
unpacks and interleaves the two high-order values from two single-precision floating-point operands |
|
|
UNPCKLPS |
unpacks and interleaves the two low-order values from two single-precision floating-point operands |
|
|
Conversion Instructions (SSE)
The SSE conversion instructions convert packed and individual doubleword integers into packed and
scalar single-precision floating-point values.
Table 3-32 Conversion Instructions (SSE)
|
|
|
|
|
CVTPI2PS |
convert packed doubleword integers to packed
single-precision floating-point values |
|
|
CVTPS2PI |
convert packed single-precision floating-point values to packed doubleword integers |
|
|
CVTSI2SS |
convert doubleword
integer to scalar single-precision floating-point value |
|
|
CVTSS2SI |
convert scalar single-precision floating-point value to a
doubleword integer |
|
|
CVTTPS2PI |
convert with truncation packed single-precision floating-point values to packed doubleword integers |
|
|
CVTTSS2SI |
convert
with truncation scalar single-precision floating-point value to scalar doubleword integer |
|
|
MXCSR State Management Instructions (SSE)
The MXCSR state management instructions save and restore the state of the MXCSR
control and status register.
Table 3-33 MXCSR State Management Instructions (SSE)
|
|
|
|
|
LDMXCSR |
load %mxcsr register |
|
|
STMXCSR |
save %mxcsr
register state |
|
|
64–Bit SIMD Integer Instructions (SSE)
The SSE 64–bit SIMD integer instructions perform operations on packed bytes, words, or
doublewords in MMX registers.
Table 3-34 64–Bit SIMD Integer Instructions (SSE)
|
|
|
|
|
PAVGB |
compute average of packed unsigned byte
integers |
|
|
PAVGW |
compute average of packed unsigned byte integers |
|
|
PEXTRW |
extract word |
|
|
PINSRW |
insert word |
|
|
PMAXSW |
maximum of packed signed word integers |
|
|
PMAXUB |
maximum of packed unsigned byte integers |
|
|
PMINSW |
minimum of packed signed word integers |
|
|
PMINUB |
minimum of packed unsigned byte integers |
|
|
PMOVMSKB |
move byte mask |
|
|
PMULHUW |
multiply packed unsigned integers and store high result |
|
|
PSADBW |
compute
sum of absolute differences |
|
|
PSHUFW |
shuffle packed integer word in MMX register |
|
|
Miscellaneous Instructions (SSE)
The following instructions control caching, prefetching, and instruction ordering.
Table 3-35 Miscellaneous Instructions (SSE)
|
|
|
|
|
MASKMOVQ |
non-temporal
store of selected bytes from an MMX register into memory |
|
|
MOVNTPS |
non-temporal store
of four packed single-precision floating-point values from an XMM register into memory |
|
|
MOVNTQ |
non-temporal
store of quadword from an MMX register into memory |
|
|
PREFETCHNTA |
prefetch data into
non-temporal cache structure and into a location close to the processor |
|
|
PREFETCHT0 |
prefetch data
into all levels of the cache hierarchy |
|
|
PREFETCHT1 |
prefetch data into level 2
cache and higher |
|
|
PREFETCHT2 |
prefetch data into level 2 cache and higher |
|
|
SFENCE |
serialize
store operations |
|
|