Michigan State Universit圜omputer Science and EngineeringPacked Parallel Compare
Michigan State Universit圜omputer Science and EngineeringPacked Parallel CompareNo new condition code flagsNo existing IA condition code flags are affected by this instruction.Result can be used as a mask to select elements from different inputs using a logical operation, eliminating branchs. Add the 2 products on the left for one result and the 2 products on the right for the other result. Michigan State Universit圜omputer Science and EngineeringPacked Multiply-AddMultiply bytes generating four 32-bit results. Michigan State Universit圜omputer Science and EngineeringMultiply-Accumulatemultiply-accumulate operations are fundamental to many signal processing algorithms like vector-dot-products, matrix multiplies, FIR and IIR Filters, FFTs, DCTs etc Michigan State Universit圜omputer Science and EngineeringPacked Add Word with unsigned saturationEach Addition is independentRightmost saturates Separate instructions are used to generate wrap-around and saturating results. Michigan State Universit圜omputer Science and EngineeringNo ModeThere is no "saturation mode bit: a new mode bit would require a change to the operating system. Michigan State Universit圜omputer Science and EngineeringSaturationSaturation: if addition results in overflow or underflow, the result is clamped to the largest or smallest value representable.This is important for pixel calculations where this would prevent a wrap-around add from causing a black pixel to suddenly turn white Michigan State Universit圜omputer Science and EngineeringPacked Add Word with wrap aroundEach Addition is independentRightmost overflows and wraps around Michigan State Universit圜omputer Science and Engineering57 InstructionsBasic arithmetic: add, subtract, multiply, arithmetic shift and multiply-add ComparisonConversion: pack & unpackLogicalShiftMove: register-to-registerLoad/Store: 64-bit and 32-bit Michigan State Universit圜omputer Science and EngineeringCompatibilityNo new exceptions or states are added.Aliases to existing FP registers: The exponent field of the corresponding floating-point register (bits 64-78) and the sign bit (bit 79) are set to ones (1's), making the value in the register a NaN (Not a Number) or infinity when viewed as a floating-point value. An MMX instruction takes all eight of the pixels at once from the MMX register, performs the arithmetic or logical operation on all eight elements in parallel, and writes the result into an MMX register. Pack eight pixels into a 64-bit MMX register. Michigan State Universit圜omputer Science and EngineeringExamplePixels are generally 8-bit integers. Michigan State Universit圜omputer Science and EngineeringData Types Michigan State Universit圜omputer Science and Engineering MMX Technology A set of basic, general purpose integer instructions: Single Instruction, Multiple Data (SIMD) 57 new instructions Eight 64-bit wide MMX registers Four new data types 8-bit pixels, 16-bit audio samplesSmall, highly repetitive loops Frequent multiply-and-accumulate Compute-intensive algorithms Highly parallel operations Michigan State Universit圜omputer Science and EngineeringCommon CharacteristicsSmall integer data types: e.g.
Michigan State Universit圜omputer Science and EngineeringFirst Step: examine codeExamined a wide range of applications: graphics, MPEG video, music synthesis, speech compression, speech recognition, image processing, games, video conferencing.Identified and analyzed the most compute-intensive routines
maintain full compatibility with existing operating systems and applications.exploit inherent parallelism in multimedia and communication algorithmsincludes new instructions and data types to improve performance. Michigan State Universit圜omputer Science and EngineeringGoalsaccelerate multimedia and communications applications. Michigan State Universit圜omputer Science and Engineering Providing specific hardware support makes sense. Multimedia and Communication consume significant computing resources. Michigan State Universit圜omputer Science and EngineeringWhy MMX?Make the Common Case Fast