SSE2 Instructions
SSE2 instructions are an extension of the SIMD execution model introduced with the
MMX technology and the SSE extensions. SSE2 instructions are divided into four subgroups:
-
Packed and scalar double-precision floating-point instructions
-
Packed single-precision floating-point conversion instructions
-
128–bit SIMD integer instructions
-
Instructions that provide cache control and instruction ordering functionality
SSE2 Packed and Scalar Double-Precision Floating-Point Instructions
The SSE2 packed and scalar double-precision floating-point instructions operate on double-precision floating-point operands.
SSE2 Data Movement Instructions
The SSE2 data movement instructions move double-precision floating-point data between XMM registers and
memory.
Table 3-36 SSE2 Data Movement Instructions
|
|
|
|
|
MOVAPD |
move two aligned packed double-precision floating-point values between XMM
registers and memory |
|
|
MOVHPD |
move high packed double-precision floating-point value to or from
the high quadword of an XMM register and memory |
|
|
MOVLPD |
move low packed single-precision
floating-point value to or from the low quadword of an XMM register and
memory |
|
|
MOVMSKPD |
extract sign mask from two packed double-precision floating-point values |
|
|
MOVSD |
move scalar
double-precision floating-point value between XMM registers and memory. |
|
|
MOVUPD |
move two unaligned packed double-precision
floating-point values between XMM registers and memory |
|
|
SSE2 Packed Arithmetic Instructions
The SSE2 arithmetic instructions operate on packed and scalar double-precision floating-point operands.
Table 3-37 SSE2 Packed Arithmetic Instructions
|
|
|
|
|
ADDPD |
add packed double-precision floating-point values |
|
|
ADDSD |
add scalar double-precision floating-point values |
|
|
DIVPD |
divide packed double-precision floating-point values |
|
|
DIVSD |
divide scalar double-precision floating-point values |
|
|
MAXPD |
return maximum
packed double-precision floating-point values |
|
|
MAXSD |
return maximum scalar double-precision floating-point value |
|
|
MINPD |
return minimum
packed double-precision floating-point values |
|
|
MINSD |
return minimum scalar double-precision floating-point value |
|
|
MULPD |
multiply packed
double-precision floating-point values |
|
|
MULSD |
multiply scalar double-precision floating-point values |
|
|
SQRTPD |
compute packed square roots
of packed double-precision floating-point values |
|
|
SQRTSD |
compute scalar square root of scalar double-precision floating-point value |
|
|
SUBPD |
subtract packed double-precision floating-point values |
|
|
SUBSD |
subtract scalar double-precision floating-point values |
|
|
SSE2 Logical Instructions
The SSE2 logical instructions operate on packed double-precision floating-point values.
Table 3-38 SSE2 Logical Instructions
|
|
|
|
|
ANDNPD |
perform bitwise logical AND NOT of packed double-precision floating-point values |
|
|
ANDPD |
perform bitwise logical
AND of packed double-precision floating-point values |
|
|
ORPD |
perform bitwise logical OR of packed double-precision
floating-point values |
|
|
XORPD |
perform bitwise logical XOR of packed double-precision floating-point values |
|
|
SSE2 Compare Instructions
The SSE2 compare instructions compare packed and scalar double-precision floating-point values and return
the results of the comparison to either the destination operand or to the
EFLAGS register.
Table 3-39 SSE2 Compare Instructions
|
|
|
|
|
CMPPD |
compare packed double-precision floating-point values |
|
|
CMPSD |
compare scalar
double-precision floating-point values |
|
|
COMISD |
perform ordered comparison of scalar double-precision floating-point values and set
flags in EFLAGS register |
|
|
UCOMISD |
perform unordered comparison of scalar double-precision floating-point values and
set flags in EFLAGS register |
|
|
SSE2 Shuffle and Unpack Instructions
The SSE2 shuffle and unpack instructions operate on packed double-precision floating-point operands.
Table 3-40 SSE2 Shuffle and Unpack Instructions
|
|
|
|
|
SHUFPD |
shuffle values in packed double-precision floating-point operands |
|
|
UNPCKHPD |
unpack and interleave the
high values from two packed double-precision floating-point operands |
|
|
UNPCKLPD |
unpack and interleave the low
values from two packed double-precision floating-point operands |
|
|
SSE2 Conversion Instructions
The SSE2 conversion instructions convert packed and individual doubleword integers into packed and
scalar double-precision floating-point values (and vice versa). These instructions also convert between packed and
scalar single-precision and double-precision floating-point values.
Table 3-41 SSE2 Conversion Instructions
|
|
|
|
|
CVTDQ2PD |
convert packed doubleword
integers to packed double-precision floating-point values |
|
|
CVTPD2DQ |
convert packed double-precision floating-point values to packed
doubleword integers |
|
|
CVTPD2PI |
convert packed double-precision floating-point values to packed doubleword integers |
|
|
CVTPD2PS |
convert packed double-precision
floating-point values to packed single-precision floating-point values |
|
|
CVTPI2PD |
convert packed doubleword integers to packed
double-precision floating-point values |
|
|
CVTPS2PD |
convert packed single-precision floating-point values to packed double-precision floating-point values |
|
|
CVTSD2SI |
convert
scalar double-precision floating-point values to a doubleword integer |
|
|
CVTSD2SS |
convert scalar double-precision floating-point values
to scalar single-precision floating-point values |
|
|
CVTSI2SD |
convert doubleword integer to scalar double-precision floating-point value |
|
|
CVTSS2SD |
convert
scalar single-precision floating-point values to scalar double-precision floating-point values |
|
|
CVTTPD2DQ |
convert with truncation packed
double-precision floating-point values to packed doubleword integers |
|
|
CVTTPD2PI |
convert with truncation packed double-precision floating-point
values to packed doubleword integers |
|
|
CVTTSD2SI |
convert with truncation scalar double-precision floating-point values to scalar
doubleword integers |
|
|
SSE2 Packed Single-Precision Floating-Point Instructions
The SSE2 packed single-precision floating-point instructions operate on single-precision floating-point and integer operands.
Table 3-42 SSE2 Packed Single-Precision Floating-Point Instructions
|
|
|
|
|
CVTDQ2PS |
convert packed doubleword integers to packed single-precision floating-point values |
|
|
CVTPS2DQ |
convert packed
single-precision floating-point values to packed doubleword integers |
|
|
CVTTPS2DQ |
convert with truncation packed single-precision floating-point
values to packed doubleword integers |
|
|
SSE2 128–Bit SIMD Integer Instructions
The SSE2 SIMD integer instructions operate on packed words, doublewords, and quadwords contained in
XMM and MMX registers.
Table 3-43 SSE2 128–Bit SIMD Integer Instructions
|
|
|
|
|
MOVDQ2Q |
move quadword integer from XMM
to MMX registers |
|
|
MOVDQA |
move aligned double quadword |
|
|
MOVDQU |
move unaligned double quadword |
|
|
MOVQ2DQ |
move
quadword integer from MMX to XMM registers |
|
|
PADDQ |
add packed quadword integers |
|
|
PMULUDQ |
multiply
packed unsigned doubleword integers |
|
|
PSHUFD |
shuffle packed doublewords |
|
|
PSHUFHW |
shuffle packed high words |
|
|
PSHUFLW |
shuffle packed low words |
|
|
PSLLDQ |
shift double quadword left logical |
|
|
PSRLDQ |
shift double quadword
right logical |
|
|
PSUBQ |
subtract packed quadword integers |
|
|
PUNPCKHQDQ |
unpack high quadwords |
|
|
PUNPCKLQDQ |
unpack low quadwords |
|
|
SSE2 Miscellaneous Instructions
The SSE2 instructions described below provide additional functionality for caching non-temporal data when storing
data from XMM registers to memory, and provide additional control of instruction ordering
on store operations.
Table 3-44 SSE2 Miscellaneous Instructions
|
|
|
|
|
CLFLUSH |
flushes and invalidates a memory operand
and its associated cache line from all levels of the processor's cache hierarchy |
|
|
LFENCE |
serializes load operations |
|
|
MASKMOVDQU |
non-temporal store of selected bytes from an XMM register
into memory |
|
|
MFENCE |
serializes load and store operations |
|
|
MOVNTDQ |
non-temporal store of double quadword
from an XMM register into memory |
|
|
MOVNTI |
non-temporal store of a doubleword from
a general-purpose register into memory |
movntiq valid only under -m64 |
|
MOVNTPD |
non-temporal store of two
packed double-precision floating-point values from an XMM register into memory |
|
|
PAUSE |
improves the performance
of spin-wait loops |
|
|