INTEL 80387 PROGRAMMER'S REFERENCE MANUAL 1987 MARCOM DISCLAIMER -- New word: Intel Certified, iRMK, SupportNET May 26, 1987 Intel Corporation makes no warranty for the use of its products and assumes no responsibility for any errors which may appear in this document nor does it make a commitment to update the information contained herein. Intel retains the right to make changes to these specifications at any time, without notice. Contact your local sales office to obtain the latest specifications before placing your order. The following are trademarks of Intel Corporation and may only be used to identify Intel Products: Above, BITBUS, COMMputer, CREDIT, Data Pipeline, FASTPATH, Genius, i, î, ICE, iCEL, iCS, iDBP, iDIS, I²ICE, iLBX, im, iMDDX, iMMX, Inboard, Insite, Intel, intel, intelBOS, Intel Certified, Intelevision, inteligent Identifier, inteligent Programming, Intellec, Intellink, iOSP, iPDS, iPSC, iRMK, iRMX, iSBC, iSBX, iSDM, iSXM, KEPROM, Library Manager, MAPNET, MCS, Megachassis, MICROMAINFRAME, MULTIBUS, MULTICHANNEL, MULTIMODULE, MultiSERVER, ONCE, OpenNET, OTP, PC BUBBLE, Plug-A-Bubble, PROMPT, Promware, QUEST, QueX, Quick-Pulse Programming, Ripplemode, RMX/80, RUPI, Seamless, SLD, SugarCube, SupportNET, UPI, and VLSiCEL, and the combination of ICE, iCS, iRMX, iSBC, iSBX, iSXM, MCS, or UPI and a numerical suffix, 4-SITE. MDS is an ordering code only and is not used as a product name or trademark. MDS(R) is a registered trademark of Mohawk Data Sciences Corporation. *MULTIBUS is a patented Intel bus. Unix is a trademark of AT&T Bell Labs. MS-DOS, XENIX, and Multiplan are trademarks of Microsoft Corporation. Lotus and 1-2-3 are registered trademarks of Lotus Development Corporation. SuperCalc is a registered trademark of Computer Associates International. Framework is a trademark of Ashton-Tate. System 370 is a trademark of IBM Corporation. AT is a registered trademark of IBM Corporation. Additional copies of this manual or other Intel literature may be obtained from: Intel Corporation Literature Distribution Mail Stop SC6-59 3065 Bowers Avenue Santa Clara, CA 95051 (c)INTEL CORPORATION 1987 CG-5/26/87 Customer Support ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Customer Support is Intel's complete support service that provides Intel customers with hardware support, software support, customer training, and consulting services. For more information contact your local sales offices. After a customer purchases any system hardware or software product, service and support become major factors in determining whether that product will continue to meet a customer's expectations. Such support requires an international support organization and a breadth of programs to meet a variety of customer needs. As you might expect, Intel's customer support is quite extensive. It includes factory repair services and worldwide field service offices providing hardware repair services, software support services, customer training classes, and consulting services. Hardware Support Services Intel is committed to providing an international service support package through a wide variety of service offerings available from Intel Hardware Support. Software Support Services Intel's software support consists of two levels of contracts. Standard support includes TIPS (Technical Information Phone Service), updates and subscription service (product-specific troubleshooting guides and COMMENTS Magazine). Basic support includes updates and the subscription service. Contracts are sold in environments which represent product groupings (i.e., iRMX environment). Consulting Services Intel provides field systems engineering services for any phase of your development or support effort. You can use our systems engineers in a variety of ways ranging from assistance in using a new product, developing an application, personalizing training, and customizing or tailoring an Intel product to providing technical and management consulting. Systems Engineers are well versed in technical areas such as microcommunications, real-time applications, embedded microcontrollers, and network services. You know your application needs; we know our products. Working together we can help you get a successful product to market in the least possible time. Customer Training Intel offers a wide range of instructional programs covering various aspects of system design and implementation. In just three to ten days a limited number of individuals learn more in a single workshop than in weeks of self-study. For optimum convenience, workshops are scheduled regularly at Training Centers woridwide or we can take our workshops to you for on-site instruction. Covering a wide variety of topics, Intel's major course categories include: architecture and assembly language, programming and operating systems, bitbus and LAN applications. Training Center Locations To obtain a complete catalog of our workshops, call the nearest Training Center in your area. Boston (617) 692-1000 Chicago (312) 310-5700 San Francisco (415) 940-7800 Washington D.C. (301) 474-2878 Isreal (972) 349-491-099 Tokyo 03-437-6611 Osaka (Call Tokyo) 03-437-6611 Toronto, Canada (416) 675-2105 London (0793) 696-000 Munich (089) 5389-1 Paris (01) 687-22-21 Stockholm (468) 734-01-00 Milan 39-2-82-44-071 Benelux (Rotterdam) (10) 21-23-77 Copenhagen (1) 198-033 Hong Kong 5-215311-7 Preface ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ This manual describes the 80387 Numeric Processor Extension (NPX) for the 80386 microprocessor. Understanding the 80387 requires an understanding of the 80386; therefore, a brief overview of 80386 concepts is presented first. A detailed discussion of the 80386 microprocessor can be found in the 80386 Programmer's Reference Manual. The 80386 Microsystem The 80386 is the basis of a new VLSI microprocessor system with exceptional capabilities for supporting large-system applications. This powerful microsystem is designed to support multiuser reprogrammable and real-time multitasking applications. Its dedicated system support circuits simplify system hardware; sophisticated hardware and software tools reduce both the time and the cost of product development. The 80386 microsystem offers a total-solution approach, enabling you to develop high-speed, interactive, multiuser, multitasking‘‘even multiprocessor‘‘systems more rapidly and at higher performance than ever before. Ž Reliability and system up-time are becoming increasingly important in all applications. Information must be protected from misuse or accidental loss. The 80386 includes a sophisticated and flexible four-level protection mechanism that can isolate layers of operating system programs from application programs to maintain a high degree of system integrity. Ž The 80386 addresses up to 4 gigabytes of physical memory to support today's application requirements. This large physical memory enables the 80386 to keep many large programs and data structures simultaneously in memory for high-speed access. Ž For applications with dynamically changing memory requirements, such as multiuser business systems, the 80386 CPU provides on-chip memory management and virtual memory support. On an 80386-based system, each user can have up to 64 terabytes of virtual-address space. This large address space virtually eliminates restrictions on the size of programs that may be part of the system. The memory management features are subject to control of systems software; therefore, systems software designers can choose among a variety of memory-organization models. Systems designers can choose to view memory in terms of fixed-length pages, in terms of variable length segments, or as a combination of pages and segments. The sizes of segments can range from one byte to 4 gigabytes. Virtual memory can be implemented either at the level of segments or at the level of pages. Ž Large multiuser or real-time multitasking systems are easily supported by the 80386. High-performance features, such as a very high-speed task switch, fast interrupt-response time, intertask protection, page-oriented virtual memory, and a quick and direct operating system interface, make the 80386 highly suited to multiuser/multitasking applications. Ž The 80386 has two primary operating modes: real-address mode and protected mode. In real-address mode, the 80386/80387 is fully upward compatible from the 8086, 8088, 80186, and 80188 microprocessors and from the 80286 real-address mode; all of the extensive libraries of 8086 and 8088 software execute 15 to 20 times faster on the 80386, without any modification. Ž In protected-address mode, the advanced memory management and protection features of the 80386 become available, without any reduction in performance. Upgrading 8086 and 8088 application programs to use these new memory management and protection features usually requires only reassembly or recompilation (some programs may require minor modification). Entire 80286 protected-mode applications can run in this mode without modification. Ž The virtual-8086 mode of the 80386 is available when the primary mode is protected mode. Virtual-8086 mode enables direct execution of multiple 8086/8088 programs within a protected-mode environment. Most 8086 and 8088 application programs can be executed in this environment without alteration (refer to the 80386 Programmer's Reference Manual for differences from 8086). This high degree of compatibility between 80386 and earlier members of the 8086 processor family reduces both the time and the cost of software development. The Organization of This Manual This manual describes the 80387 Numeric Processor Extension (NPX) for the 80386 microprocessor. The material in this manual is presented from the perspective of software designers, both at an applications and at a systems software level. Ž Chapter 1, "Introduction to the 80387 Numerics Processor Extension," gives an overview of the 80387 NPX and reviews the concepts of numeric computation using the 80387. Ž Chapter 2, "80387 Numerics Processor Architecture," presents the registers and data types of the 80387 to both applications and systems programmers. Ž Chapter 3, "Special Computational Situations," discusses the special values that can be represented in the 80387's real formats‘‘denormal numbers, zeros, infinities, NaNs (not a number)‘‘as well as numerics exceptions. This chapter should be read thoroughly by systems programmers, but may be skimmed by applications programmers. Many of these special values and exceptions may never occur in applications programs. Ž Chapter 4, "80387 Instruction Set," provides functional information for software designers generating applications for systems containing an 80386 CPU with an 80387 NPX. The 80386/80387 instruction set mnemonics are explained in detail. Ž Chapter 5, "Programming Numeric Applications," provides a description of programming facilities for 80386/80387 systems. A comparative 80387 programming example is given. Ž Chapter 6, "System-Level Numeric Programming," provides information of interest to systems software writers, including details of the 80387 architecture and operational characteristics. Ž Chapter 7, "Numeric Programming Examples," provides several detailed programming examples for the 80387, including conditional branching, the conversion betweenfloating-point values and their ASCII representations, and the use of trigonometric functions. These examples illustrate assembly-language programming on the 80387 NPX. Ž Appendix A, "Machine Instruction Encoding and Decoding," gives reference information on the encoding of NPX instructions. This information is useful to writers of debuggers, exception handlers, and compilers. Ž Appendix B, "Exception Summary," provides a list of the exceptions that each instruction can cause. This list is valuable to both applications and systems programmers. Ž Appendix C, "Compatability between the 80387 and the 80287/8087," describes the differences from the 80387 that are common to the 80287 and the 8087. Ž Appendix D, "Compatability between the 80387 and the 8087," describes the additional differences between the 80387 and the 8087 that are of concern when porting 8086/8087 programs directly to the 80386/80387. Ž Appendix E Please consult the most recent 80387 data sheet for these specifications, "80387 80-Bit CHMOS III Numeric Processor Extension," reproduces a data sheet of 80387 specifications that is separately available. The table of instruction timings in this appendix will be of interest to many readers of this manual. (The AC specifications have been deliberately left out.) The specifications in data sheets are subject to change; consult the most recent data sheet for design-in information. Ž Appendix F, "PC/AT-Compatible 80387 Connection," documents a nonstandard method of connecting an 80387 to an 80386 to achieve compatibility with the IBM PC/AT. Ž The Glossary defines 80387 and floating-point terminology. Refer to it as needed. Related Publications To best use the material in this manual, readers should be familiar with the operation and architecture of 80386 systems. The following manuals contain information related to the content of this manual and of interest to programmers of 80387 systems: Ž Introduction to the 80386, order number 231252 Ž 80386 Data Sheet, order number 231630 Ž 80386 Hardware Reference Manual, order number 231732 Ž 80386 Programmer's Reference Manual, order number 230985 Ž 80387 Data Sheet, order number 231920 Notational Conventions This manual uses special notation to represent sub and superscript characters. Subscript characters are surrounded by {curly brackets}, for example 10{2} = 10 base 2. Superscript characters are preceeded by a caret and enclosed within (parentheses), for example 10^(3) = 10 to the third power. Table of Contents ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Chapter 1 Introduction to the 80387 Numerics Processor Extension 1.1 History 1.2 Performance 1.3 Ease of Use 1.4 Applications 1.5 Upgradability 1.6 Programming Interface Chapter 2 80387 Numerics Processor Architecture 2.1 80387 Registers 2.1.1 The NPX Register Stack 2.1.2 The NPX Status Word 2.1.3 Control Word 2.1.4 The NPX Tag Word 2.1.5 The NPX Instruction and Data Pointers 2.2 Computation Fundamentals 2.2.1 Number System 2.2.2 Data Types and Formats 2.2.2.1 Binary Integers 2.2.2.2 Decimal Integers 2.2.2.3 Real Numbers 2.2.3 Rounding Control 2.2.4 Precision Control Chapter 3 Special Computational Situations 3.1 Special Numeric Values 3.1.1 Denormal Real Numbers 3.1.1.1 Denormals and Gradual Underflow 3.1.2 Zeros 3.1.3 Infinity 3.1.4 NaN (Not-a-Number) 3.1.4.1 Signaling NaNs 3.1.4.2 Quiet NaNs 3.1.5 Indefinite 3.1.6 Encoding of Data Types 3.1.7 Unsupported Formats 3.2 Numeric Exceptions 3.2.1 Handling Numeric Exceptions 3.2.1.1 Automatic Exception Handling 3.2.1.2 Software Exception Handling 3.2.2 Invalid Operation 3.2.2.1 Stack Exception 3.2.2.2 Invalid Arithmetic Operation 3.2.3 Division by Zero 3.2.4 Denormal Operand 3.2.5 Numeric Overflow and Underflow 3.2.5.1 Overflow 3.2.5.2 Underflow 3.2.6 Inexact (Precision) 3.2.7 Exception Priority 3.2.8 Standard Underflow/Overflow Exception Handler Chapter 4 The 80387 Instruction Set 4.1 Compatibility with the 80287 and 8087 4.2 Numeric Operands 4.3 Data Transfer Instructions 4.3.1 FLD source 4.3.2 FST destination 4.3.3 FSTP destination 4.3.4 FXCH//destination 4.3.5 FILD source 4.3.6 FIST destination 4.3.7 FISTP destination 4.3.8 FBLD source 4.3.9 FBSTP destination 4.4 Nontranscendental Instructions 4.4.1 Addition 4.4.2 Normal Subtraction 4.4.3 Reversed Subtraction 4.4.4 Multiplication 4.4.5 Normal Division 4.4.6 Reversed Division 4.4.7 FSQRT 4.4.8 FSCALE 4.4.9 FPREM---Partial Remainder (80287/8087-Compatible) 4.4.10 FPREM1---Partial Remainder (IEEE Std. 754-Compatible) 4.4.11 FRNDINT 4.4.12 FXTRACT 4.4.13 FABS 4.4.14 FCHS 4.5 Comparison Instructions 4.5.1 FCOM//source 4.5.2 FCOMP//source 4.5.3 FCOMPP 4.5.4 FICOM source 4.5.5 FICOMP source 4.5.6 FTST 4.5.7 FUCOM//source 4.5.8 FUCOMP//source 4.5.9 FUCOMPP 4.5.10 FXAM 4.6 Transcendental Instructions 4.6.1 FCOS 4.6.2 FSIN 4.6.3 FSINCOS 4.6.4 FPTAN 4.6.5 FPATAN 4.6.6 F2XM1 4.6.7 FYL2X 4.6.8 FYL2XP1 4.7 Constant Instructions 4.7.1 FLDZ 4.7.2 FLD1 4.7.3 FLDPI 4.7.4 FLDL2T 4.7.5 FLDL2E 4.7.6 FLDLG2 4.7.7 FLDLN2 4.8 Processor Control Instructions 4.8.1 FINIT/FNINIT 4.8.2 FLDCW source 4.8.3 FSTCW/FNSTCW destination 4.8.4 FSTSW/FNSTSW destination 4.8.5 FSTSW AX/FNSTSW AX 4.8.6 FCLEX/FNCLEX 4.8.7 FSAVE/FNSAVE destination 4.8.8 FRSTOR source 4.8.9 FSTENV/FNSTENV destination 4.8.10 FLDENV source 4.8.11 FINCSTP 4.8.12 FDECSTP 4.8.13 FFREE destination 4.8.14 FNOP 4.8.15 FWAIT (CPU Instruction) Chapter 5 Programming Numeric Applications 5.1 Programming Facilities 5.1.1 High-Level Languages 5.1.2 C Programs 5.1.3 PL/M-386 5.1.4 ASM386 5.1.4.1 Defining Data 5.1.4.2 Records and Structures 5.1.4.3 Addressing Methods 5.1.5 Comparative Programming Example 5.1.6 80387 Emulation 5.2 Concurrent Processing with the 80387 5.2.1 Managing Concurrency 5.2.1.1 Incorrect Exception Synchronization 5.2.1.2 Proper Exception Synchronization Chapter 6 System-Level Numeric Programming 6.1 80386/80387 Architecture 6.1.1 Instruction and Operand Transfer 6.1.2 Independent of CPU Addressing Modes 6.1.3 Dedicated I/O Locations 6.2 Processor Initialization and Control 6.2.1 System Initialization 6.2.2 Hardware Recognition of the NPX 6.2.3 Software Recognition of the NPX 6.2.4 Configuring the Numerics Environment 6.2.5 Initializing the 80387 6.2.6 80387 Emulation 6.2.7 Handling Numerics Exceptions 6.2.8 Simultaneous Exception Response 6.2.9 Exception Recovery Examples Chapter 7 Numeric Programming Examples 7.1 Conditional Branching Example 7.2 Exception Handling Examples 7.3 Floating-Point to ASCII Conversion Examples 7.3.1 Function Partitioning 7.3.2 Exception Considerations 7.3.3 Special Instructions 7.3.4 Description of Operation 7.3.5 Scaling the Value 7.3.5.1 Inaccuracy in Scaling 7.3.5.2 Avoiding Underflow and Overflow 7.3.5.3 Final Adjustments 7.3.6 Output Format 7.4 Trigonometric Calculation Examples (Not Tested) Appendix A Machine Instruction Encoding and Decoding Appendix B Exception Summary Appendix C Compatibility Between the 80387 and the 80287/8087 Appendix D Compatibility Between the 80387 and the 8087 Appendix E 80387 80-Bit CHMOS III Numeric Processor Extension Appendix F PC/AT-Compatible 80387 Connection Glossary of 80387 and Floating-Point Terminology Figures 1-1 Evolution and Performance of Numeric Processors 2-1 80387 Register Set 2-2 80387 Status Word 2-3 80387 Control Word Format 2-4 80387 Tag Word Format 2-5 Protected Mode 80387 Instruction and Data Pointer Image in Memory, 32-Bit Format 2-6 Real Mode 80387 Instruction and Data Pointer Image in Memory, 32-Bit Format 2-7 Protected Mode 80387 Instruction and Data Pointer Image in Memory, 16-Bit Format 2-8 Real Mode 80387 Instruction and Data Pointer Image in Memory, 16-Bit Format 2-9 80387 Double-Precision Number System 2-10 80387 Data Formats 3-1 Floating-Point System with Denormals 3-2 Floating-Point System without Denormals 3-3 Arithmetic Example Using Infinity 4-1 FSAVE/FRSTOR Memory Layout (32-Bit) 4-2 FSAVE/FRSTOR Memory Layout (16-Bit) 4-3 Protected Mode 80387 Environment, 32-Bit Format 4-4 Real Mode 80387 Environment, 32-Bit Format 4-5 Protected Mode 80387 Environment, 16-Bit Format 4-6 Real Mode 80387 Environment, 16-Bit Format 5-1 Sample C-386 Program 5-2 Sample 80387 Constants 5-3 Status Word Record Definition 5-4 Structure Definition 5-5 Sample PL/M-386 Program 5-6 Sample ASM386 Program 5-7 Instructions and Register Stack 5-8 Exception Synchronization Examples 6-1 Software Routine to Recognize the 80287 7-1 Conditional Branching for Compares 7-2 Conditional Branching for FXAM 7-3 Full-State Exception Handler 7-4 Reduced-Latency Exception Handler 7-5 Reentrant Exception Handler 7-6 Floating-Point to ASCII Conversion Routine 7-7 See page 7-22 in the printed version of this manual Relationships between Adjacent Joints 7-8 Robot Arm Kinematics Example Tables 1-1 Numeric Processing Speed Comparisons 1-2 Numeric Data Types 1-3 Principal NPX Instructions 2-1 Condition Code Interpretation 2-2 Correspondence between 80387 and 80386 Flag Bits 2-3 Summary of Format Parameters 2-4 Real Number Notation 2-5 Rounding Modes 3-1 Arithmetic and Nonarithmetic Instructions 3-2 Denormalization Process 3-3 Zero Operands and Results 3-4 Infinity Operands and Results 3-5 Rules for Generating QNaNs 3-6 Binary Integer Encodings 3-7 Packed Decimal Encodings 3-8 Single and Double Real Encodings 3-9 Extended Real Encodings 3-10 Masked Responses to Invalid Operations 3-11 Masked Overflow Results 4-1 Data Transfer Instructions 4-2 Nontranscendental Instructions 4-3 Basic Nontranscendental Instructions and Operands 4-4 Condition Code Interpretation after FPREM and FPREM Instructions 4-5 Comparison Instructions 4-6 Condition Code Resulting from Comparisons 4-7 Condition Code Resulting from FTST 4-8 Condition Code Defining Operand Class 4-9 Transcendental Instructions 4-10 Results of FPATAN 4-11 Constant Instructions 4-12 Processor Control Instructions 5-1 PL/M-386 Built-In Procedures 5-2 ASM386 Storage Allocation Directives 5-3 Addressing Method Examples 6-1 NPX Processor State Following Initialization Chapter 1 Introduction to the 80387 Numerics Processor Extension ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ The 80387 NPX is a high-performance numerics processing element that extends the 80386 architecture by adding significant numeric capabilities and direct support for floating-point, extended-integer, and BCD data types. The 80386 CPU with 80387 NPX easily supports powerful and accurate numeric applications through its implementation of the IEEE Standard 754 for Binary Floating-Point Arithmetic. The 80387 provides floating-point performance comparable to that of large minicomputers while offering compatibility with object code for 8087 and 80287. 1.1 History The 80387 Numeric Processor Extension (NPX) is compatible with its predecessors, the earlier Intel 8087 NPX and 80287 NPX. As the 80386 runs 8086 programs, so programs designed to use the 8087 and 80287 should run unchanged on the 80387. The 8087 NPX was designed for use in 8086-family systems. The 8086 was the first microprocessor family to partition the processing unit to permit high-performance numeric capabilities. The 8087 NPX for this processor family implemented a complete numeric processing environment in compliance with an early proposal for the IEEE 754 Floating-Point Standard. With the 80287 Numeric Processor Extension, high-speed numeric computations were extended to 80286 high-performance multitasking and multiuser systems. Multiple tasks using the numeric processor extension were afforded the full protection of the 80286 memory management and protection features. The 80387 Numeric Processor Extension is Intel's third generation numerics processor. The 80387 implements the final IEEE standard, adds new trigonometric instructions, and uses a new design and CHMOS-III process to allow higher clock rates and require fewer clocks per instruction. Together, the 80387 with additional instructions and the improved standard bring even more convenience and reliability to numerics programming and make this convenience and reliability available to applications that need the high-speed and large memory capacity of the 32-bit environment of the 80386 CPU. Figure 1-1 illustrates the relative performance of 5-MHz 8086/8087, 8-MHz 80286/80287, and 20-MHz 80386/80387 systems in executing numerics-oriented applications. Figure 1-1. Evolution and Performance of Numeric Processors 16 80386/80387 (20 MHz) 15 14 13 12 11 RELATIVE 10 PERFORMANCE 9 8 7 6 5 4 3 80286/80287 (8 MHz) 2 1 8086/8087 (5 MHz) ”‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ 1980 1983 1987 YEAR INTRODUCED 1.2 Performance Table 1-1 compares the execution times of several 80387 instructions with the equivalent operations executed on an 8-MHz 80287. As indicated in the table, the 16-MHz 80387 NPX provides about 5 to 6 times the performance of an 8-MHz 80287 NPX. A 16-MHz 80387 multiplies 32-bit and 64-bit floating-point numbers in about 1.9 and 2.8 microseconds, respectively. Of course, the actual performance of the NPX in a given system depends on the characteristics of the individual application. Although the performance figures shown in Table 1-1 refer to operations on real (floating-point) numbers, the 80387 also manipulates fixed-point binary and decimal integers of up to 64 bits or 18 digits, respectively. The 80387 can improve the speed of multiple-precision software algorithms for integer operations by 10 to 100 times. Because the 80387 NPX is an extension of the 80386 CPU, no software overhead is incurred in setting up the NPX for computation. The 80387 and 80386 processors coordinate their activities in a manner transparent to software. Moreover, built-in coordination facilities allow the 80386 CPU to proceed with other instructions while the 80387 NPX is simultaneously executing numeric instructions. Programs can exploit this concurrency of execution to further increase system performance and throughput. Table 1-1. Numeric Processing Speed Comparisons Approximate Performance Ratios: Floating-Point Instruction 16 MHz 80386/80387 ÷ ’‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ 8 MHz 80286/80287 FADD ST, ST(i) Addition 6.2 FDIV dword_var Division 4.7 FYL2X stack (0), (1) assumed Logarithm 6.0 FPATAX stack (0) assumed Arctangent 2.6 The ratio is higher if the operand is not in range of the 80287 instruction. F2XM1 stack (0) assumed Exponentiation 2.7 The ratio is higher if the operand is not in range of the 80287 instruction. 1.3 East of Use The 80387 NPX offers more than raw execution speed for computation-intensive tasks. The 80387 brings the functionality and power of accurate numeric computation into the hands of the general user. These features are available in most high-level languages available for the 80386. Like the 8087 and 80287 that preceded it, the 80387 is explicitly designed to deliver stable, accurate results when programmed using straightforward "pencil and paper" algorithms. The IEEE standard 754 specifically addresses this issue, recognizing the fundamental importance of making numeric computations both easy and safe to use. For example, most computers can overflow when two single-precision floating-point numbers are multiplied together and then divided by a third, even if the final result is a perfectly valid 32-bit number. The 80387 delivers the correctly rounded result. Other typical examples of undesirable machine behavior in straightforward calculations occur when computing financial rate of return, which involves the expression (1 + i)^(n) or when solving for roots of a quadratic equation: -b ± ¹(b² - 4ac) ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ 2a If a does not equal 0, the formula is numerically unstable when the roots are nearly coincident or when their magnitudes are wildly different. The formula is also vulnerable to spurious over/underflows when the coefficients a, b, and c are all very big or all very tiny. When single-precision (4-byte) floating-point coefficients are given as data and the formula is evaluated in the 80387's normal way, keeping all intermediate results in its stack, the 80387 produces impeccable single-precision roots. This happens because, by default and with no effort on the programmer's part, the 80387 evaluates all those subexpressions with so much extra precision and range as to overwhelm any threat to numerical integrity. If double-precision data and results were at issue, a better formula would have to be used, and once again the 80387's default evaluation of that formula would provide substantially enhanced numerical integrity over mere double-precision evaluation. On most machines, straightforward algorithms will not deliver consistently correct results (and will not indicate when they are incorrect). To obtain correct results on traditional machines under all conditions usually requires sophisticated numerical techniques that are foreign to most programmers. General application programmers using straightforward algorithms will produce much more reliable programs using the 80387. This simple fact greatly reduces the software investment required to develop safe, accurate computation-based products. Beyond traditional numerics support for scientific applications, the 80387 has built-in facilities for commercial computing. It can process decimal numbers of up to 18 digits without round-off errors, performing exact arithmetic on integers as large as 2^(64) or 10^(18). Exact arithmetic is vital in accounting applications where rounding errors may introduce monetary losses that cannot be reconciled. The NPX contains a number of optional facilities that can be invoked by sophisticated users. These advanced features include directed rounding, gradual underflow, and programmed exception-handling facilities. These automatic exception-handling facilities permit a high degree of flexibility in numeric processing software, without burdening the programmer. While performing numeric calculations, the NPX automatically detects exception conditions that can potentially damage a calculation (for example, X ÷ 0 or ¹X when X < 0). By default, on-chip exception logic handles these exceptions so that a reasonable result is produced and execution may proceed without program interruption. Alternatively, the NPX can signal the CPU, invoking a software exception handler to provide special results whenever various types of exceptions are detected. 1.4 Applications The 80386's versatility and performance make it appropriate to a broad array of numeric applications. In general, applications that exhibit any of the following characteristics can benefit by implementing numeric processing on the 80387: Ž Numeric data vary over a wide range of values, or include nonintegral values. Ž Algorithms produce very large or very small intermediate results. Ž Computations must be very precise; i.e., a large number of significant digits must be maintained. Ž Performance requirements exceed the capacity of traditional microprocessors. Ž Consistently safe, reliable results must be delivered using a programming staff that is not expert in numerical techniques. Note also that the 80387 can reduce software development costs and improve the performance of systems that use not only real numbers, but operate on multiprecision binary or decimal integer values as well. A few examples, which show how the 80387 might be used in specific numerics applications, are described below. In many cases, these types of systems have been implemented in the past with minicomputers or small mainframe computers. The advent of the 80387 brings the size and cost savings of microprocessor technology to these applications for the first time. Ž Business data processing‘‘The NPX's ability to accept decimal operands and produce exact decimal results of up to 18 digits greatly simplifies accounting programming. Financial calculations that use power functions can take advantage of the 80387's exponentiation and logarithmic instructions. Many business software packages can benefit from the speed and accuracy of the 80387; for example, Lotus* 1-2-3*, Multiplan*, SuperCalc*, and Framework*. Ž Simulation‘‘The large (32-bit) memory space of the 80386 coupled with the raw speed of the 80386 and 80387 processors make 80386/80387 microsystems suitable for attacking large simulation problems, which heretofore could only be executed on expensive mini and mainframe computers. For example, complex electronic circuit simulations using SPICE can now be performed on a microcomputer, the 80386/80387. Simulation of mechanical systems using finite element analysis can employ more elements, resulting in more detailed analysis or simulation of larger systems. Ž Graphics transformations‘‘The 80387 can be used in graphics terminals to locally perform many functions that normally demand the attention of a main computer; these include rotation, scaling, and interpolation. By also using an 82786 Graphics Display Controller to perform high-speed drawing and window management, very powerful and highly self-sufficient terminals can be built from a relatively small number of 80386 family parts. Ž Process control‘‘The 80387 solves dynamic range problems automatically, and its extended precision allows control functions to be fine-tuned for more accurate and efficient performance. Control algorithms implemented with the NPX also contribute to improved reliability and safety, while the 80387's speed can be exploited in real-time operations. Ž Computer numerical control (CNC)‘‘The 80387 can move and position machine tool heads with accuracy in real-time. Axis positioning also benefits from the hardware trigonometric support provided by the 80387. Ž Robotics‘‘Coupling small size and modest power requirements with powerful computational abilities, the 80387 is ideal for on-board six-axis positioning. Ž Navigation‘‘Very small, lightweight, and accurate inertial guidance systems can be implemented with the 80387. Its built-in trigonometric functions can speed and simplify the calculation of position from bearing data. Ž Data acquisition‘‘The 80387 can be used to scan, scale, and reduce large quantities of data as it is collected, thereby lowering storage requirements and time required to process the data for analysis. The preceding examples are oriented toward traditional numerics applications. There are, in addition, many other types of systems that do not appear to the end user as computational, but can employ the 80387 to advantage. Indeed, the 80387 presents the imaginative system designer with an opportunity similar to that created by the introduction of the microprocessor itself. Many applications can be viewed as numerically-based if sufficient computational power is available to support this view (e.g., character generation for a laser printer). This is analogous to the thousands of successful products that have been built around "buried" microprocessors, even though the products themselves bear little resemblance to computers. 1.5 Upgradability The architecture of the 80386 CPU is specifically adapted to allow easy upgradability to use an 80387, simply by plugging in the 80387 NPX. For this reason, designers of 80386 systems may wish to incorporate the 80387 NPX into their designs in order to offer two levels of price and performance at little additional cost. Two features of the 80386 CPU make the design and support of upgradable 80386 systems particularly simple: Ž The 80386 can be programmed to recognize the presence of an 80387 NPX; that is, software can recognize whether it is running on an 80386 with or without an 80387 NPX. Ž After determining whether the 80387 NPX is available, the 80386 CPU can be instructed to let the NPX execute all numeric instructions. If an 80387 NPX is not available, the 80386 CPU can emulate all 80387 numeric instructions in software. This emulation is completely transparent to the application software‘‘the same object code may be used by 80386 systems both with and without an 80387 NPX. No relinking or recompiling of application software is necessary; the same code will simply execute faster with the 80387 NPX than without. To facilitate this design of upgradable 80386 systems, Intel provides a software emulator for the 80387 that provides the functional equivalent of the 80387 hardware, implemented in software on the 80386. Except for timing, the operation of this 80387 emulator (EMUL387) is the same as for the 80387 NPX hardware. When the emulator is combined as part of the systems software, the 80386 system with 80387 emulation and the 80386 with 80387 hardware are virtually indistinguishable to an application program. This capability makes it easy for software developers to maintain a single set of programs for both systems. System manufacturers can offer the NPX as a simple plug-in performance option without necessitating any changes in the user's software. 1.6 Programming Interface The 80386/80387 pair is programmed as a single processor; all of the 80387 registers appear to a programmer as extensions of the basic 80386 register set. The 80386 has a class of instructions known as ESCAPE instructions, all having a common format. These ESC instructions are numeric instructions for the 80387 NPX. These numeric instructions for the 80387 are simply encoded into the instruction stream along with 80386 instructions. All of the CPU memory-addressing modes may be used in programming the NPX, allowing convenient access to record structures, numeric arrays, and other memory-based data structures. All of the memory management and protection features of the CPU (both paging and segmentation) are extended to the NPX as well. Numeric processing in the 80387 centers around the NPX register stack. Programmers can treat these eight 80-bit registers either as a fixed register set, with instructions operating on explicitly-designated registers, or as a classical stack, with instructions operating on the top one or two stack elements. Internally, the 80387 holds all numbers in a uniform 80-bit extended format. Operands that may be represented in memory as 16-, 32-, or 64-bit integers, 32-, 64-, or 80-bit floating-point numbers, or 18-digit packed BCD numbers, are automatically converted into extended format as they are loaded into the NPX registers. Computation results are subsequently converted back into one of these destination data formats when they are stored into memory from the NPX registers. Table 1-2 lists each of the seven data types supported by the 80387, showing the data format for each type. All operands are stored in memory with the least significant digits starting at the initial (lowest) memory address. Numeric instructions access and store memory operands using only this initial address. For maximum system performance, all operands should start at memory addresses divisible by four. Table 1-3 lists the 80387 instructions by class. No special programming tools are necessary to use the 80387, because all of the NPX instructions and data types are directly supported by the ASM386 Assembler, by high-level languages from Intel, and by assemblers and compilers produced by many independent software vendors. Software routines for the 80387 may be written in ASM386 Assembler or any of the following higher-level languages from Intel: PL/M-386 C-386 In addition, all of the development tools supporting the 8086/8087 and 80286/80287 can also be used to develop software for the 80386/80387. All of these high-level languages provide programmers with access to the computational power and speed of the 80387 without requiring an understanding of the architecture of the 80386 and 80387 chips. Such architectural considerations as concurrency and synchronization are handled automatically by these high-level languages. For the ASM386 programmer, specific rules for handling these issues are discussed in a later section of this manual. The following operating systems are known or expected to support the 80387: RMX-286/386, MS-DOS, Xenix-286/386, and Unix-286/386. Advanced in-circuit debugging support is provided by ICE-386. Table 1-2. Numeric Data Types Data Type Bits Significant Approximate Range (Decimal) Digits (Decimal) Word integer 16 4 -32,768 ¾ X ¾ +32,767 Short integer 32 9 -2*10^(9) ¾ X ¾ +2*10^(9) Long integer 64 18 -9*10^(18) ¾ X ¾ +9*10^(18) Packed decimal 80 18 -99...99 ¾ X ¾ +99...99 (18 digits) Single real 32 6-7 1.18*10^(-38) ¾ X ¾ 3.40*10^(38) Double real 64 15-16 2.23*10^(-308) ¾ X ¾ 1.80*10^(308) Extended real Equivalent to double extended format of IEEE Std 754 80 19 3.30*10^(-4932) ¾ X ¾ 1.21*10^(4932) Table 1-3. Principal NPX Instructions Class Instruction Types Data Transfer Load (all data types), Store (all data types), Exchange Arithmetic Add, Subtract, Multiply, Divide, Subtract Reversed, Divide Reversed, Square Root, Scale, Remainder, Integer Part, Change Sign, Absolute Value, Extract Comparison Compare, Examine, Test Transcendental Tangent, Arctangent, Sine, Cosine, Sine and Cosine, 2^(x) - 1, Y * Log{2}(X), Y * Log{2}(X+1) Constants 0, 1, Ò, Log{10}2, Log{e}2, Log{2}10, Log{2}e Processor Control Load Control Word, Store Control Word, Store Status Word, Load Environment, Store Environment, Save, Restore, Clear Exceptions, Initialize Class Instruction Types Data Transfer Load (all data types), Store (all data types), Exchange Arithmetic Add, Subtract, Multiply, Divide, Subtract Reversed, Divide Reversed, Square Root, Scale, Remainder, Integer Part, Change Sign, Absolute Value, Extract Comparison Compare, Examine, Test Transcendental Tangent, Arctangent, Sine, Cosine, Sine and Cosine, 2^(x) - 1, Y * Log{2}(X), Y * Log{2}(X+1) Constants 0, 1, Ò, Log{10}2, Log{e}2, Log{2}10, Log{2}e Processor Control Load Control Word, Store Control Word, Store Status Word, Load Environment, Store Environment, Save, Restore, Clear Exceptions, Initialize Chapter 2 80387 Numerics Processor Architecture ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ To the programmer, the 80387 NPX appears as a set of additional registers, data types, and instructions‘‘all of which complement those of the 80386. Refer to Chapter 4 for detailed explanations of the 80387 instruction set. This chapter explains the new registers and data types that the 80387 brings to the architecture of the 80386. 2.1 80387 Registers The additional registers consist of Ž Eight individually-addressable 80-bit numeric registers, organized as a register stack Ž Three sixteen-bit registers containing: the NPX status word the NPX control word the tag word Ž Two 48-bit registers containing pointers to the current instruction and operand (these registers are actually located in the 80386) All of the NPX numeric instructions focus on the contents of these NPX registers. 2.1.1 The NPX Register Stack The 80387 register stack is shown in Figure 2-1. Each of the eight numeric registers in the 80387's register stack is 80 bits wide and is divided into fields corresponding to the NPX's extended real data type. Numeric instructions address the data registers relative to the register on the top of the stack. At any point in time, this top-of-stack register is indicated by the TOP (stack TOP) field in the NPX status word. Load or push operations decrement TOP by one and load a value into the new top register. A store-and-pop operation stores the value from the current TOP register and then increments TOP by one. Like 80386 stacks in memory, the 80387 register stack grows down toward lower-addressed registers. Many numeric instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the TOP. The ASM386 Assembler supports these register addressing modes, using the expression ST(0), or simply ST, to represent the current Stack Top and ST(i) to specify the ith register from TOP in the stack (0 ¾ i ¾ 7). For example, if TOP contains 011B (register 3 is the top of the stack), the following statement would add the contents of two registers in the stack (registers 3 and 5): FADD ST, ST(2) The stack organization and top-relative addressing of the numeric registers simplify subroutine programming by allowing routines to pass parameters on the register stack. By using the stack to pass parameters rather than using "dedicated" registers, calling routines gain more flexibility in how they use the stack. As long as the stack is not full, each routine simply loads the parameters onto the stack before calling a particular subroutine to perform a numeric calculation. The subroutine then addresses its parameters as ST, ST(1), etc., even though TOP may, for example, refer to physical register 3 in one invocation and physical register 5 in another. Figure 2-1. 80387 Register Set 80387 DATA REGISTERS TAG FIELD 79 78 64 63 0 1 0 ‚ÐЃ ‚ƒ R0€SIGNEXPONENT SIGNIFICAND € € € R1Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ R2Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ R3Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ R4Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ R5Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ R6Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ R7Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘ „¤¤… „… 15 0 47 0 ‚ƒ ‚ƒ € CONTROL REGISTER € € INSTRUCTION POINTER € Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € STATUS REGISTER € € DATA POINTER € Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ „… € TAG WORD € „… 2.1.2 The NPX Status Word The 16-bit status word shown in Figure 2-2 reflects the overall state of the 80387. This status word may be stored into memory using the FSTSW/FNSTSW, FSTENV/FNSTENV, and FSAVE/FNSAVE instructions, and can be transferred into the 80386 AX register with the FSTSW AX/FNSTSW AX instructions, allowing the NPX status to be inspected by the CPU. The B-bit (bit 15) is included for 8087 compatibility only. It reflects the contents of the ES bit (bit 7 of the status word), not the status of the BUSY# output of the 80387. The four NPX condition code bits (C{3}-C{0}) are similar to the flags in a CPU: the 80387 updates these bits to reflect the outcome of arithmetic operations. The effect of these instructions on the condition code bits is summarized in Table 2-1. These condition code bits are used principally for conditional branching. The FSTSW AX instruction stores the NPX status word directly into the CPU AX register, allowing these condition codes to be inspected efficiently by 80386 code. The 80386 SAHF instruction can copy C{3}-C{0} directly to 80386 flag bits to simplify conditional branching. Table 2-2 shows the mapping of these bits to the 80386 flag bits. Bits 12-14 of the status word point to the 80387 register that is the current Top of Stack (TOP). The significance of the stack top has been described in the prior section on the register stack. Figure 2-2 shows the six exception flags in bits 0-5 of the status word. Bit 7 is the exception summary status (ES) bit. ES is set if any unmasked exception bits are set, and is cleared otherwise. If this bit is set, the ERROR# signal is asserted. Bits 0-5 indicate whether the NPX has detected one of six possible exception conditions since these status bits were last cleared or reset. They are "sticky" bits, and can only be cleared by the instructions FINIT, FCLEX, FLDENV, FSAVE, and FRSTOR. Bit 6 is the stack fault (SF) bit. This bit distinguishes invalid operations due to stack overflow or underflow from other kinds of invalid operations. When SF is set, bit 9 (C{1}) distinguishes between stack overflow (C{1} = 1) and underflow (C{1} = 0). Figure 2-2. 80387 Status Word ’‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ 80387 BUSY ’‘‘‘˜‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ TOP OF STACK POINTER ’‘‘‘‘‘‘‘‘‘‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ CONDITION CODE         15 7 0 ‚ÐÐÐÐÐÐÐÐÐÐÐÐÐÐЃ € B C TOP C C C E S P U O Z D I € € 3 2 1 0 S F E E E E E E € „¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤…         ERROR SUMMARY STATUS ‘‘‘‘‘‘‘‘‘‘‘‘‘• STACK FAULT ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• EXCEPTION FLAGS PRECISION ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• UNDERFLOW ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• OVERFLOW ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• ZERO DIVIDE ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• DENORMALIZED OPERAND ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• INVALID OPERATION ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ NOTE: ES IS SET IF ANY UNMASKED EXCEPTION BIT IS SET; CLEARED OTHERWISE. SEE TABLE 2-1 FOR INTERPRETATION OF CONDITION CODE. TOP VALUES: 000 = REGISTER 0 IS TOP OF STACK 001 = REGISTER 1 IS TOP OF STACK . . . 111 = REGISTER 7 IS TOP OF STACK FOR DEFINITIONS OF EXCEPTIONS, REFER TO CHAPTER 3. ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Table 2-1. Condition Code Interpretation Instruction C0 (S) C3 (Z) C1 (A) C2 (C) FPREM, FPREM1 Three least significant bits Reduction of quotient Q2 Q0 Q1 0=complete or O/U# 1=incomplete FCOM, FCOMP, FCOMPP, FTST, Result of comparison Zero Operand is not FUCOM, FUCOMP, or O/U# comparable FUCOMPP, FICOM, FICOMP FXAM Operand class Sign Operand class or O/U# FCHS, FABS, FXCH, FINCTOP, FDECTOP, Constant UNDEFINED Zero UNDEFINED loads, FXTRACT, or O/U# FLD, FILD, FBLD, FSTP (ext real) FIST, FBSTP, FRNDINT, FST, FSTP, FADD, FMUL, FDIV, FDIVR, FSUB, UNDEFINED Roundup UNDEFINED FSUBR, FSCALE, or O/U# FSQRT, FPATAN, F2XM1, FYL2X, FYL2XP1 FPTAN, FSIN, UNDEFINED Roundup Reduction FCOS, FSINCOS or O/U# 0=complete undefined 1=incomplete if C2=1 FLDENV, FRSTOR Each bit loaded from memory FLDCW, FSTENV, FSTCW, FSTSW, UNDEFINED FCLEX, FINIT, FSAVE ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ NOTES O/U# When both IE and SF bits of status word are set, indicating a stack exception, this bit distinguishes between stack overflow (C1=1) and underflow (C1=0). Reduction If FPREM and FPREM1 produces a remainder that is less than the modulus, reduction is complete. When reduction is incomplete the value at the top of the stack is a partial remainder, which can be used as input to further reduction. For FPTAN, FSIN, FCOS, and FSINCOS, the reduction bit is set if the operand at the top of the stack is too large. In this case the original operand remains at the top of the stack. Roundup When the PE bit of the status word is set, this bit indicates whether the last rounding in the instruction was upward. UNDEFINED Do not rely on finding any specific value in these bits. ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Table 2-2. Correspondence between 80387 and 80386 Flag Bits 80387 Flag 80386 Flag C{0} CF C{1} (none) C{2} PF C{3} ZF 2.1.3 Control Word The NPX provides the programmer with several processing options, which are selected by loading a word from memory into the control word. Figure 2-3 shows the format and encoding of the fields in the control word. The low-order byte of this control word configures the 80387 exception masking. Bits 0-5 of the control word contain individual masks for each of the six exception conditions recognized by the 80387. The high-order byte of the control word configures the 80387 processing options, including Ž Precision control Ž Rounding control The precision-control bits (bits 8-9) can be used to set the 80387 internal operating precision at less than the default precision (64-bit significand). These control bits can be used to provide compatibility with the earlier-generation arithmetic processors having less precision than the 80387. The precision-control bits affect the results of only the following five arithmetic instructions: ADD, SUB(R), MUL, DIV(R), and SQRT. No other operations are affected by PC. The rounding-control bits (bits 10-11) provide for the common round-to-nearest mode, as well as directed rounding and true chop. Rounding control affects only the arithmetic instructions (refer to Chapter 3 for lists of arithmetic and nonarithmetic instructions). Figure 2-3. 80387 Control Word Format ’‘‘‘˜‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘RESERVED ’‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ (INFINITY CONTROL) This "infinity control" bit is not meaningful to the 80387. To maintain compatibility with the 80287, this bit can be programmed; however, regardless of its value, the 80387 treats infinity in the affine sense (-ý < +ý). ’‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ ROUNDING CONTROL ’‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ PRECISION CONTROL         15 7 0 ‚ÐÐÐÐÐÐÐÐÐÐÐÐÐÐЃ € X X X X RC PC X X P U O Z D I € € M M M M M M € „¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤…         RESERVED ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘™‘‘‘• EXECEPTION MASKS PRECISION ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• UNDERFLOW ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• OVERFLOW ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• ZERO DIVIDE ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• DENORMALIZED OPERAND ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• INVALID OPERATION ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ NOTE: PRECISION CONTROL ROUNDING CONTROL 00--24 BITS (SINGLE PRECISION) 00--ROUND TO NEAREST OR EVEN 01--(RESERVED) 01--ROUND DOWN (TOWARD -ý) 10--53 BITS (DOUBLE PRECISION) 10--ROUND UP (TOWARD +ý) 11--64 BITS (EXTENDED PRECISION) 11--CHOP (TRUNCATE TOWARDS ZERO) ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ 2.1.4 The NPX Tag Word The tag word indicates the contents of each register in the register stack, as shown in Figure 2-4. The tag word is used by the NPX itself to distinguish between empty and nonempty register locations. Programmers of exception handlers may use this tag information to check the contents of a numeric register without performing complex decoding of the actual data in the register. The tag values from the tag word correspond to physical registers 0-7. Programmers must use the current top-of-stack (TOP) pointer stored in the NPX status word to associate these tag values with the relative stack registers ST(0) through ST(7). The exact values of the tags are generated during execution of the FSTENV and FSAVE instructions according to the actual contents of the nonempty stack locations. During execution of other instructions, the 80387 updates the TW only to indicate whether a stack location is empty or nonempty. Figure 2-4. 80387 Tag Word Format 15 0 ‚ÐÐÐÐÐÐÐÐÐÐÐÐÐÐЃ € TAG (7) TAG (6) TAG (5) TAG (4) TAG (3) TAG (2) TAG (1) TAG (0)€ „¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤… TAG VALUES: 00 = VALID 01 = ZERO 10 = INVALID OR INFINITY 11 = EMPTY 2.1.5 The NPX Instruction and Data Pointers The instruction and data pointers provide support for programmed exception-handlers. These registers are actually located in the 80386, but appear to be located in the 80387 because they are accessed by the ESC instructions FLDENV, FSTENV, FSAVE, and FRSTOR. Whenever the 80386 decodes an ESC instruction, it saves the instruction address, the operand address (if present), and the instruction opcode. When stored in memory, the instruction and data pointers appear in one of four formats, depending on the operating mode of the 80386 (protected mode or real-address mode) and depending on the operand-size attribute in effect (32-bit operand or 16-bit operand). When the 80386 is in virtual-8086 mode, the real-address mode formats are used. Figures 2-5 through 2-8 show these pointers as they are stored following an FSTENV instruction. The FSTENV and FSAVE instructions store this data into memory, allowing exception handlers to determine the precise nature of any numeric exceptions that may be encountered. The instruction address saved in the 80386 (as in the 80287) points to any prefixes that preceded the instruction. This is different from the 8087, for which the instruction address points only to the ESC instruction opcode. Note that the processor control instructions FINIT, FLDCW, FSTCW, FSTSW, FCLEX, FSTENV, FLDENV, FSAVE, FRSTOR, and FWAIT do not affect the data pointer. Note also that, except for the instructions just mentioned, the value of the data pointer is undefined if the prior ESC instruction did not have a memory operand. Figure 2-5. Protected Mode 80387 Instruction and Data Pointer Image in Memory, 32-Bit Format 32-BIT PROTECTED MODE FORMAT 31 23 15 7 0 ‚ÏÏσ € RESERVED CONTROL WORD €0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED STATUS WORD €4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED TAG WORD €8H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € IP OFFSET €CH Ñ‘‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € 0 0 0 0 0 OPCODE 10..0 CS SELECTOR €10H Ñ‘‘‘‘‘‘‘‘‘‘™‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € DATA OPERAND OFFSET €14H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED OPERAND SELECTOR €18H „ÏÏÏ… Figure 2-6. Real Mode 80387 Instruction and Data Pointer Image in Memory, 32-Bit Format 32-BIT REAL ADDRESS MODE FORMAT 31 23 15 7 0 ‚ÏÏσ € RESERVED CONTROL WORD €0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED STATUS WORD €4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED TAG WORD €8H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED INSTRUCTION POINTER 15..0 €CH Ñ‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘˜‘˜‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € 0 0 0 0 INSTRUCTION POINTER 31..16 0 OPCODE 10..0 €10H Ñ‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘™‘™‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED OPERAND POINTER €14H Ñ‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € 0 0 0 0 OPERAND POINTER 31..16 0 0 0 0 0 0 0 0 0 0 0 0€18H „¤ÏϤυ Figure 2-7. Protected Mode 80387 Instruction and Data Pointer Image in Memory, 16-Bit Format 16-BIT PROTECTED MODE FORMAT 15 7 0 ‚σ € CONTROL WORD € 0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € STATUS WORD € 2H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € TAG WORD € 4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € IP OFFSET € 6H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € CB SELECTOR € 8H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € OPERAND OFFSET € AH Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € OPERAND SELECTOR € CH „Ï… Figure 2-8. Real Mode 80387 Instruction and Data Pointer Image in Memory, 16-Bit Format 16-BIT REAL-ADDRESS MODE AND VIRTUAL-8086 MODE FORMAT 15 7 0 ‚σ € CONTROL WORD € 0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € STATUS WORD € 2H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € TAG WORD € 4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € INSTRUCTION POINTER 15..0 € 6H Ñ‘‘‘‘‘‘‘‘˜‘˜‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ €IP 19..160 OPCODE 10..0 € 8H Ñ‘‘‘‘‘‘‘‘™‘™‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € OPERAND POINTER 15..0 € AH Ñ‘‘‘‘‘‘‘‘˜‘˜‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ €OP 19..1600 0 0 0 0 0 0 0 0 0 0€ CH „¤¤Ï… 2.2 Computation Fundamentals This section covers 80387 programming concepts that are common to all applications. It describes the 80387's internal number system and the various types of numbers that can be employed in NPX programs. The most commonly used options for rounding and precision (selected by fields in the control word) are described, with exhaustive coverage of less frequently used facilities deferred to later sections. Exception conditions that may arise during execution of NPX instructions are also described along with the options that are available for responding to these exceptions. 2.2.1 Number System The system of real numbers that people use for pencil and paper calculations is conceptually infinite and continuous. There is no upper or lower limit to the magnitude of the numbers one can employ in a calculation, or to the precision (number of significant digits) that the numbers can represent. When considering any real number, there are always arbitrarily many numbers both larger and smaller. There are also arbitrarily many numbers between (i.e., with more significant digits than) any two real numbers. For example, between 2.5 and 2.6 are 2.51, 2.5897, 2.500001, etc. While ideally it would be desirable for a computer to be able to operate on the entire real number system, in practice this is not possible. Computers, no matter how large, ultimately have fixed-size registers and memories that limit the system of numbers that can be accommodated. These limitations determine both the range and the precision of numbers. The result is a set of numbers that is finite and discrete, rather than infinite and continuous. This sequence is a subset of the real numbers that is designed to form a useful approximation of the real number system. Figure 2-9 superimposes the basic 80387 real number system on a real number line (decimal numbers are shown for clarity, although the 80387 actually represents numbers in binary). The dots indicate the subset of real numbers the 80387 can represent as data and final results of calculations. The 80387's range of double-precision, normalized numbers is approximately ±2.23 * 10^(-308) to ±1.80 * 10^(308). Applications that are required to deal with data and final results outside this range are rare. For reference, the range of the IBM System 370* is about ±0.54 * 10^(-78) to ±0.72 * 10^(76). The finite spacing in Figure 2-9 illustrates that the NPX can represent a great many, but not all, of the real numbers in its range. There is always a gap between two adjacent 80387 numbers, and it is possible for the result of a calculation to fall in this space. When this occurs, the NPX rounds the true result to a number that it can represent. Thus, a real number that requires more digits than the 80387 can accommodate (e.g., a 20-digit number) is represented with some loss of accuracy. Notice also that the 80387's representable numbers are not distributed evenly along the real number line. In fact, an equal number of representable numbers exists between successive powers of 2 (i.e., as many representable numbers exist between 2 and 4 as between 65,536 and 131,072). Therefore, the gaps between representable numbers are larger as the numbers increase in magnitude. All integers in the range ±2^(64) (approximately ±10^(18)), however, are exactly representable. In its internal operations, the 80387 actually employs a number system that is a substantial superset of that shown in Figure 2-9. The internal format (called extended real) extends the 80387's range to about ±3.30 * 10^(-4932) to ±1.21 * 10^(4932), and its precision to about 19 (equivalent decimal) digits. This format is designed to provide extra range and precision for constants and intermediate results, and is not normally intended for data or final results. From a practical standpoint, the 80387's set of real numbers is sufficiently large and dense so as not to limit the vast majority of microprocessor applications. Compared to most computers, including mainframes, the NPX provides a very good approximation of the real number system. It is important to remember, however, that it is not an exact representation, and that arithmetic on real numbers is inherently approximate. Conversely, and equally important, the 80387 does perform exact arithmetic on integer operands. That is, if an operation on two integers is valid and produces a result that is in range, the result is exact. For example, 4 ÷ 2 yields an exact integer, 1 ÷ 3 does not, and 2^(40) * 2^(30) + 1 does not, because the result requires greater than 64 bits of precision. Figure 2-9. 80387 Double-Precision Number System |‘‘‘ NEGATIVE RANGE (NORMALIZED) ‘‘| | | | -5 -4 -3 -2 -1 | ’‘‘‘˜‘‘‘˜‘‘˜“’‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘“ ››››››œœœœœœ ”‘‘‘™‘‘‘™‘‘™•”‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘•   -2.23 X 10^(-308)• ” -1.80 X 10^(308) ’‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ ‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘ œœœœœœœœœ |‘‘ POSITIVE RANGE (NORMALIZED) ‘‘‘| œœœœœœœœœ | | ‘¨‘‘‘‘‘¨‘‘‘‘‘¨‘‘‘ | 1 2 3 4 5 | ‘˜‘ ’‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘“’˜‘‘˜‘‘‘˜‘‘‘“ ”2.00000000000000000 œœœœœœ›››››› ” (NOT REPRESENTABLE) ”‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘•”™‘‘™‘‘‘™‘‘‘• ”‘‘‘‘‘‘1.99999999999999999  ”‘‘‘—  PRECISION‘ 18 DIGITS ‘ ”‘‘‘‘‘‘‘‘“ 1.80 X 10^(308)• ” 2.23 X 10^(-308) ”‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• 2.2.2 Data Types and Formats The 80387 recognizes seven numeric data types for memory-based values, divided into three classes: binary integers, packed decimal integers, and binary reals. A later section describes how these formats are stored in memory (the sign is always located in the highest-addressed byte). Figure 2-10 summarizes the format of each data type. In the figure, the most significant digits of all numbers (and fields within numbers) are the leftmost digits. Figure 2-10. 80387 Data Formats ’‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ MOST HIGHEST ADDRESSED DATA RANGE PRECISIONSIGNIFICANT BYTE BYTE FORMATS –‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘˜‘‘‘“ 7 07 07 07 07 07 07 07 07 07 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘™‘‘‘— WORD –‘‘‘‘‘‘“(TWO'S INTEGER 10^(4) 16 BITS –‘‘‘‘‘‘•COMPLEMENT) 15 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘— SHORT –‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“(TWO'S INTEGER 10^(2) 32 BITS –‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘•COMPLEMENT) 31 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘— LONG –‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“(TWO'S INTEGER 10^(19) 64 BITS –‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘•COMPLEMENT) 6 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘— MAGNITUDE PACKED –‘˜‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘¨¨¨‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ BCD 10^(18) 18 DIGITSS X d{17} d{16} d{2} d{1} d{0} –‘™‘‘‘™‘‘‘‘‘™‘‘‘‘‘™‘¨¨¨‘™‘‘‘‘‘™‘‘‘‘‘™‘‘‘‘‘• 72 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘— –‘˜‘‘‘‘‘˜‘‘‘‘‘‘‘“ SINGLE 10^(±38) 24 BITS S BE SIGN. PRECISION –‘™‘‘‘‘‘™‘‘‘‘‘‘‘• 31 23 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘— –‘˜‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ DOUBLE 10^(±308) 53 BITS S BE SIGNIFICAND PRECISION –‘™‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• 63 52 0 –‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘— –‘˜‘‘‘‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ EXTENDED 10^(4932) 64 BITS S BE –‘“ SIGNIFICAND PRECISION –‘™‘‘‘‘‘‘‘‘‘‘‘‘™I™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• 79 64 63 0 ”‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘• ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ NOTE: (1) BE = BIASED EXPONENT (2) S = SIGN BIT (0 = positive, 1 = negative) (3) d{n} = DECIMAL DIGIT (TWO PER TYPE) (4) X = BITS HAVE NO SIGNIFICANCE; 80387 IGNORES WHEN LOADING, ZEROS IN WHEN STORING (5)  = POSITION OF IMPLICIT BINARY POINT (6) I = INTEGER BIT OF SIGNIFICAND; STORED IN TEMPORARY REAL, IMPLICIT IN SINGLE AND DOUBLE PRECISION (7) EXPONENT BIAS (NORMALIZED VALUES): SINGLE: 127 (7FH) DOUBLE: 1023 (3FFH) EXTENDED REAL: 16383 (3FFFH) (8) PACKED BCD: (-1)^(S) (D{17}...D{0}) (9) REAL: (-1)^(S) (2^(E-BIAS)) (F{0}F{1}...) ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ 2.2.2.1 Binary Integers The three binary integer formats are identical except for length, which governs the range that can be accommodated in each format. The leftmost bit is interpreted as the number's sign: 0 = positive and 1 = negative. Negative numbers are represented in standard two's complement notation (the binary integers are the only 80387 format to use two's complement). The quantity zero is represented with a positive sign (all bits are 0). The 80387 word integer format is identical to the 16-bit signed integer data type of the 80386; the 80387 short integer format is identical to the 32-bit signed integer data type of the 80386. The binary integer formats exist in memory only. When used by the 80387, they are automatically converted to the 80-bit extended real format. All binary integers are exactly representable in the extended real format. 2.2.2.2 Decimal Integers Decimal integers are stored in packed decimal notation, with two decimal digits "packed" into each byte, except the leftmost byte, which carries the sign bit (0 = positive, 1 = negative). Negative numbers are not stored in two's complement form and are distinguished from positive numbers only by the sign bit. The most significant digit of the number is the leftmost digit. All digits must be in the range 0-9. The decimal integer format exists in memory only. When used by the 80387, it is automatically converted to the 80-bit extended real format. All decimal integers are exactly representable in the extended real format. 2.2.2.3 Real Numbers The 80387 represents real numbers of the form: (-1)^(s)2^(E)(b{0}b{1}b{2}b{3}..b{p-1}) ...where... s = 0 or 1 E = any integer between Emin and Emax, inclusive b{i} = 0 or 1 p = number of bits of precision Table 2-3 summarizes the parameters for each of the three real-number formats. The 80387 stores real numbers in a three-field binary format that resembles scientific, or exponential, notation. The format consists of the following fields: Ž The number's significant digits are held in the significand field, b{0} b{1} b{2} b{3}..b{p-1}. (The term "significand" is analogous to the term "mantissa" used to describe floating point numbers on some computers.) Ž The exponent field, e = E+bias, locates the binary point within the significant digits (and therefore determines the number's magnitude). (The term "exponent" is analogous to the term "characteristic" used to describe floating point numbers on somecomputers.) Ž The 1-bit sign field indicates whether the number is positive or negative. Negative numbers differ from positive numbers only in the sign bits of their significands. Table 2-4 shows how the real number 178.125 (decimal) is stored in the 80387 single real format. The table lists a progression of equivalent notations that express the same value to show how a number can be converted from one form to another. (The ASM386 and PL/M-386 language translators perform a similar process when they encounter programmer-defined real number constants.) Note that not every decimal fraction has an exact binary equivalent. The decimal number 1/10, for example, cannot be expressed exactly in binary (just as the number 1/3 cannot be expressed exactly in decimal). When a translator encounters such a value, it produces a rounded binary approximation of the decimal value. The NPX usually carries the digits of the significand in normalized form. This means that, except for the value zero, the significand contains an integer bit and fraction bits as follows: 1{}fff...ff where {} indicates an assumed binary point. The number of fraction bits varies according to the real format: 23 for single, 52 for double, and 63 for extended real. By normalizing real numbers so that their integer bit is always a 1, the 80387 eliminates leading zeros in small values (X < 1). This technique maximizes the number of significant digits that can be accommodated in a significand of a given width. Note that, in the single and double formats, the integer bit is implicit and is not actually stored; the integer bit is physically present in the extended format only. If one were to examine only the significand with its assumed binary point, all normalized real numbers would have values greater than or equal to 1 and less than 2. The exponent field locates the actual binary point in the significant digits. Just as in decimal scientific notation, a positive exponent has the effect of moving the binary point to the right, and a negative exponent effectively moves the binary point to the left, inserting leading zeros as necessary. An unbiased exponent of zero indicates that the position of the assumed binary point is also the position of the actual binary point. The exponent field, then, determines a real number's magnitude. In order to simplify comparing real numbers (e.g., for sorting), the 80387 stores exponents in a biased form. This means that a constant is added to the true exponent described above. As Table 2-3 shows, the value of this bias is different for each real format. It has been chosen so as to force the biased exponent to be a positive value. This allows two real numbers (of the same format and sign) to be compared as if they are unsigned binary integers. That is, when comparing them bitwise from left to right (beginning with the leftmost exponent bit), the first bit position that differs orders the numbers; there is no need to proceed further with the comparison. A number's true exponent can be determined simply by subtracting the bias value of its format. The single and double real formats exist in memory only. If a number in one of these formats is loaded into an 80387 register, it is automatically converted to extended format, the format used for all internal operations. Likewise, data in registers can be converted to single or double real for storage in memory. The extended real format may be used in memory also, typically to store intermediate results that cannot be held in registers. Most applications should use the double format to store real-number data and results; it provides sufficient range and precision to return correct results with a minimum of programmer attention. The single real format is appropriate for applications that are constrained by memory, but it should be recognized that this format provides a smaller margin of safety. It is also useful for the debugging of algorithms, because roundoff problems will manifest themselves more quickly in this format. The extended real format should normally be reserved for holding intermediate results, loop accumulations, and constants. Its extra length is designed to shield final results from the effects of rounding and overflow/underflow in intermediate calculations. However, the range and precision of the double format are adequate for most microcomputer applications. Table 2-3. Summary of Format Parameters Parameter ’‘‘‘‘‘‘‘‘ Format ‘‘‘‘‘‘‘‘“ Single Double Extended Format width in bits 32 64 80 p (bits of precision) 24 53 64 Exponent width in bits 8 11 15 Emax +127 +1023 +16383 Emin -126 -1022 -16382 Exponent bias +127 +1023 +16383 Table 2-4. Real Number Notation Notation Value Ordinary Decimal 178.125 Scientific Decimal 1{}78125E2 Scientific Binary 1{}0110010001E111 Scientific Binary 1{}0110010001E10000110 (Biased Exponent) 80387 Single Format Sign Biased Exponent Significand (Normalized) 0 10000110 01100100010000000000000 1{}(implicit) 2.2.3 Rounding Control Internally, the 80387 employs three extra bits (guard, round, and sticky bits) that enable it to round numbers in accord with the infinitely precise true result of a computation; these bits are not accessible to programmers. Whenever the destination can represent the infinitely precise true result, the 80387 delivers it. Rounding occurs in arithmetic and store operations when the format of the destination cannot exactly represent the infinitely precise true result. For example, a real number may be rounded if it is stored in a shorter real format, or in an integer format. Or, the infinitely precise true result may be rounded when it is returned to a register. The NPX has four rounding modes, selectable by the RC field in the control word (see Figure 2-3). Given a true result b that cannot be represented by the target data type, the 80387 determines the two representable numbers a and c that most closely bracket b in value (a < b < c). The processor then rounds (changes) b to a or to c according to the mode selected by the RC field as shown in Table 2-5. Rounding introduces an error in a result that is less than one unit in the last place to which the result is rounded. Ž "Round to nearest" is the default mode and is suitable for most applications; it provides the most accurate and statistically unbiased estimate of the true result. Ž The "chop" or "round toward zero" mode is provided for integer arithmeticapplications. Ž "Round up" and "round down" are termed directed rounding and can be used to implement interval arithmetic. Interval arithmetic generates a certifiable result independent of the occurrence of rounding and other errors. The upper and lower bounds of an interval may be computed by executing an algorithm twice, rounding up in one pass and down in the other. Rounding control affects only the arithmetic instructions (refer to Chapter 3 for lists of arithmetic and nonarithmetic instructions). 2.2.4 Precision Control The 80387 allows results to be calculated with either 64, 53, or 24 bits of precision in the significand as selected by the precision control (PC) field of the control word. The default setting, and the one that is best suited for most applications, is the full 64 bits of significance provided by the extended real format. The other settings are required by the IEEE standard and are provided to obtain compatibility with the specifications of certain existing programming languages. Specifying less precision nullifies the advantages of the extended format's extended fraction length. When reduced precision is specified, the rounding of the fractional value clears the unused bits on the right to zeros. Table 2-5. Rounding Modes RC Field Rounding Mode Rounding Action 00 Round to nearest Closer to b of a or c; if equally close, select even number (the one whose least significant bit is zero). 01 Round down (toward -ý) a 10 Round up (toward +ý) c 11 Chop (toward 0) Smaller in magnitude of a or c. ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ NOTE a < b < c; a and c are successive representable numbers; b is not representable. ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Chapter 3 Special Computational Situations ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ Besides being able to represent positive and negative numbers, the 80387 data formats may be used to describe other entities. These special values provide extra flexibility, but most users will not need to understand them in order to use the 80387 successfully. This section describes the special values that may occur in certain cases and the significance of each. The 80387 exceptions are also described, for writers of exception handlers and for those interested in probing the limits of computation using the 80387. The material presented in this section is mainly of interest to programmers concerned with writing exception handlers. Many readers will only need to skim this section. When discussing these special computational situations, it is useful to distinguish between arithmetic instructions and nonarithmetic instructions. Nonarithmetic instructions are those that have no operands or transfer their operands without substantial change; arithmetic instructions are those that make significant changes to their operands. Table 3-1 defines these two classes of instructions. Table 3-1. Arithmetic and Nonarithmetic Instructions Nonarithmetic Instructions Arithmetic Instructions FABS F2XM1 FCHS FADD (P) FCLEX FBLD FDECSTP FBSTP FFREE FCOMP(P)(P) FINCSTP FCOS FINIT FDIV(R)(P) FLD (register-to-register) FIADD FLD (extended format from memory) FICOM(P) FLD constant FIDIV(R) FLDCW FILD FLDENV FIMUL FNOP FIST(P) FRSTOR FISUB(R) FSAVE FLD (conversion) FST(P) (register-to-register) FMUL(P) FSTP (extended format to memory) FPATAN FSTCW FPREM FSTENV FPREM1 FSTSW FPTAN FWAIT FRNDINT FXAM FSCALE FXCH FSIN FSINCOS FSQRT FST(P) (conversion) FSUB(R)(P) FTST FUCOM(P)(P) FXTRACT FYL2X FYL2XP1 3.1 Special Numeric Values The 80387 data formats encompass encodings for a variety of special values in addition to the typical real or integer data values that result from normal calculations. These special values have significance and can express relevant information about the computations or operations that produced them. The various types of special values are Ž Denormal real numbers Ž Zeros Ž Positive and negative infinity Ž NaN (Not-a-Number) Ž Indefinite Ž Unsupported formats The following sections explain the origins and significance of each of these special values. Tables 3-6 through 3-9 at the end of this section show how each of these special values is encoded for each of the numeric data types. 3.1.1 Denormal Real Numbers The 80387 generally stores nonzero real numbers in normalized floating-point form; that is, the integer (leading) bit of the significand is always a one. (Refer to Chapter 2 for a review of operand formats.) This bit is explicitly stored in the extended format, and is implicitly assumed to be a one (1{}) in the single and double formats. Since leading zeros are eliminated, normalized storage allows the maximum number of significant digits to be held in a significand of a given width. When a numeric value becomes very close to zero, normalized floating-point storage cannot be used to express the value accurately. The term tiny is used here to precisely define what values require special handling by the 80387. A number R is said to be tiny when -2{Emin} < R < 0 or 0 < R < +2{Emin}. (As defined in Chapter 2, Emin is -126 for single format, -1022 for double format, and -16382 for extended format.) In other words, a nonzero number is tiny if its exponent would be too negative to store in the destination format. To accommodate these instances, the 80387 can store and operate on reals that are not normalized, i.e., whose significands contain one or more leading zeros. Denormals typically arise when the result of a calculation yields a value that is tiny. Denormal values have the following properties: Ž The biased floating-point exponent is stored at its smallest value (zero) Ž The integer bit of the significand (whether explicit or implicit) is zero The leading zeros of denormals permit smaller numbers to be represented, at the possible cost of some lost precision (the number of significant bits is reduced by the leading zeros). In typical algorithms, extremely small values are most likely to be generated as intermediate, rather than final, results. By using the NPX's extended real format for holding intermediate values, quantities as small as ±3.4*10{-4932} can be represented; this makes the occurrence of denormal numbers a rare phenomenon in 80387 applications. Nevertheless, the NPX can load, store, and operate on denormalized real numbers when they do occur. Denormals receive special treatment by the 80387 in three respects: Ž The 80387 avoids creating denormals whenever possible. In other words, it always normalizes real numbers except in the case of tiny numbers. Ž The 80387 provides the unmasked underflow exception to permit programmers to detect cases when denormals would be created. Ž The 80387 provides the denormal exception to permit programmers to detect cases when denormals enter into further calculations. Denormalizing means incrementing the true result's exponent and inserting a corresponding leading zero in the significand, shifting the rest of the significand one place to the right. Denormal values may occur in any of the single, double, or extended formats. Table 3-2 illustrates how a result might be denormalized to fit a single format destination. Denormalization produces either a denormal or a zero. Denormals are readily identified by their exponents, which are always the minimum for their formats; in biased form, this is always the bit string: 00..00. This same exponent value is also assigned to the zeros, but a denormal has a nonzero significand. A denormal in a register is tagged special. Tables 3-8 and 3-9 show how denormal values are encoded in each of the real data formats. The denormalization process causes loss of significance if low-order one-bits bits are shifted off the right of the significand. In a severe case, all the significand bits of the true result are shifted out and replaced by the leading zeros. In this case, the result of denormalization is a true zero, and, if the value is in a register, it is tagged as a zero. Denormals are rarely encountered in most applications. Typical debugged algorithms generate extremely small results during the evaluation of intermediate subexpressions; the final result is usually of an appropriate magnitude for its single or double format real destination. If intermediate results are held in temporary real, as is recommended, the great range of this format makes underflow very unlikely. Denormals are likely to arise only when an application generates a great many intermediates, so many that they cannot be held on the register stack or in extended format memory variables. If storage limitations force the use of single or double format reals for intermediates, and small values are produced, underflow may occur, and, if masked, may generate denormals. When a denormal number is single or double format is used as a source operand and the denormal exception is masked, the 80387 automatically normalizes the number when it is converted to extended format. Table 3-2. Denormalization Process Operation Sign Exponent Significand True Result 0 -129 1{}01011100..00 Denormalize 0 -128 0{}101011100..00 Denormalize 0 -127 0{}0101011100..00 Denormalize 0 -126 0{}00101011100..00 Denormal Result 0 -126 0{}00101011100..00 3.1.1.1 Denormals and Gradual Underflow Floating-point arithmetic cannot carry out all operations exactly for all operands; approximation is unavoidable when the exact result is not representable as a floating-point variable. To keep the approximation mathematically tractable, the hardware is made to conform to accuracy standards that can be modeled by certain inequalities instead of equations. Let the assignment X  Y @ Z (where @ is some operation) represent a typical operation. In the default rounding mode (round to nearest), each operation is carried out with an absolute error no larger than half the separation between the two floating-point numbers closest to the exact results. Let x be the value stored for the variable whose name in the program is X, and similarly y for Y, and z for Z. Normally y and z will differ by accumulated errors from what is desired and from what would have been obtained in the absence of error. For the calculation of x we assume that y and z are the best approximations available, and we seek to compute x as well as we can. If y@z is representable exactly, then we expect x = y@z, and that is what we get for every algebraic operation on the 80387 (i.e., when y@z is one of y+z, y-z, y*z, y÷z, sqrt z). But if y@z must be approximated, as is usually the case, then x must differ from y@z by no more than half the difference between the two representable numbers that straddle y@z. That difference depends on two factors: 1. The precision to which the calculation is carried out, as determined either by the precision control bits or by the format used in memory. On the 80387, the precisions are single (24 significant bits), double (53 significant bits), and extended (64 significant bits). 2. How close y@z is to zero. In this respect the presence of denormal numbers on the 80387 provides a distinct advantage over systems that do not admit denormal numbers. In any floating-point number system, the density of representable numbers is greater near zero than near the largest representable magnitudes. However, machines that do not use denormal numbers suffer from an enormous gap between zero and its closest neighbors. Figures 3-1 and 3-2 show what happens near zero in two kinds of floating-point number systems. Figure 3-1 shows a floating-point number system that (like the 80387) admits denormal numbers. For simplicity, only the non-negative numbers appear and the figure illustrates a number system that carries just four significant bits instead of the 24, 53, or 64 significant bits that the 80387 offers. Each vertical mark stands for a number representable in four significant bits, and the bolder marks stand for the normal powers of 2. The denormal numbers lie between 0 and the nearest normal power of 2. They are no less dense than the remaining normal nonzero numbers. Figure 3-2 shows a floating-point number system that (unlike the 80387) does not admit denormal numbers. There are two yawning gaps, one on the positive side of zero (as illustrated) and one on the negative side of zero (not illustrated). The gap between zero and the nearest neighbor of zero differs from the gap between that neighbor and the next bigger number by a factor of about 8.4 * 10^(6) for single, 4.5 * 10^(15) for double, and 9.2*10^(18) for extended format. Those gaps would horribly complicate error analysis. The advantage of denormal numbers is apparent when one considers what happens in either case when the underflow exception is masked and y@z falls into the space between zero and the smallest normal magnitude. The 80387 returns the nearest denormal number. This action might be called "gradual underflow." The effect is no different than the rounding that can occur when y@z falls in the normal range. On the other hand, the system that does not have denormal numbers returns zero as the result, an action that can be much more inaccurate than rounding. This action could be called "abrupt underflow." Figure 3-1. Floating-Point System with Denormals 0++++++++++++++-+-+-+-+-+-+-+----+---+---+---+---+---+---+---------+... ”‘‘˜‘‘• - - - - - - - - Normal Numbers - - - - - - Denormals Figure 3-2. Floating-Point System without Denormals 0 +++++++-+-+-+-+-+-+-+----+---+---+---+---+---+---+---------+---... - - - - - - - - Normal Numbers - - - - - - 3.1.2 Zeros The value zero in the real and decimal integer formats may be signed either positive or negative, although the sign of a binary integer zero is always positive. For computational purposes, the value of zero always behaves identically, regardless of sign, and typically the fact that a zero may be signed is transparent to the programmer. If necessary, the FXAM instruction may be used to determine a zero's sign. If a zero is loaded or generated in a register, the register is tagged zero. Table 3-3 lists the results of instructions executed with zero operands and also shows how a zero may be created from nonzero operands. Table 3-3. Zero Operands and Results Key to symbols used in this table X and Y denote nonzero operand. * Sign of original zero operand. # Sign of original X operand. -# Compliment of sign of original X operand. Þ Exclusive OR of the signs of the operands. Operation Operands Result FLD,FBLD +0 +0 -0 -0 FILD +0 +0 FST,FSTP +0 +0 -0 -0 +X +0 When extreme underflow denormalizes the result to zero. -X -0 When extreme underflow denormalizes the result to zero. FBSTP +0 +0 -0 -0 FIST,FISTP +0 +0 -0 -0 +X +0 When 0 < X < 1 and rounding mode is not up. -X -0 When 0 < X < 1 and rounding mode is not up. Addition +0 plus +0 +0 -0 plus -0 -0 +0 plus -0, -0 plus +0 ±0 Sign determined by rounding mode: + for nearest, up, or chop, - for down. -X plus +X, +X plus -X ±0 Sign determined by rounding mode: + for nearest, up, or chop, - for down. ±0 plus ±X, ±X plus ±0 #X Subtraction +0 minus -0+0 -0 minus +0 -0 +0 minus +0, -0 minus -0 ±0 Sign determined by rounding mode: + for nearest, up, or chop, - for down. +X minus +X, -X minus -X ±0 Sign determined by rounding mode: + for nearest, up, or chop, - for down. ±0 minus ±X -#X ±X minus ±0 #X Multiplication +0 * +0, -0 * -0 +0 +0 * -0, -0 * +0 -0 +0 * +X, +X * +0 +0 +0 * -X, -X * +0 -0 -0 * +X, -X * +0 -0 Multiplication -0 * -X, -X * -0 +0 +X * +Y, -X * -Y +0 When extreme underflow denormalizes the result to zero. +X * -Y, -X * +Y -0 When extreme underflow denormalizes the result to zero. Division ±0 ÷ ±0 Invalid Operation ±X ÷ ±0 Þý (Zero Divide) +0 ÷ +X, -0 ÷ -X +0 +0 ÷ -X, -0 ÷ +X -0 -X ÷ -Y, +X ÷ +Y +0 When extreme underflow denormalizes the result to zero. -X ÷ +Y, +X ÷ -Y -0 When extreme underflow denormalizes the result to zero. FPREM, FPREM1 ±0 rem ±0 Invalid Operation ±X rem ±0 Invalid Operation +0 rem ±X +0 -0 rem ±X -0 FPREM +X rem ±Y +0 Y exactly divides X -X rem ±Y -0 Y exactly divides X FPREM1 +X rem ±Y +0 Y exactly divides X -X rem ±Y -0 Y exactly divides X FSQRT +0 +0 -0 -0 Compare ±0 : +X ±0 < +X ±0 : ±0 ±0 = ±0 ±0 : -X ±0 > -X FTST ±0 ±0 = 0 +0 C{3}=1; C{2}=C{1}=C{0}=0 -0 C{3}=C{1}=1; C{2}=C{0}=0 FCHS +0 -0 -0 +0 FABS ±0 +0 F2XM1 +0 +0 -0 -0 FRNDINT +0 +0 -0 -0 FSCALE ±0 scaled by -ý *0 ±0 scaled by +ý Invalid Operation ±0 scaled by X *0 FXTRACT +0 ST=+0,ST(1)=-ý, Zero divide -0 ST=-0,ST(1)=-ý, Zero divide FPTAN±0 *0 FSIN (or ±0 *0 SIN result of FSINCOS) FCOS (or ±0 +1 COS result of FSINCOS) FPATAN ±0 ÷ +X *0 ±0 ÷ -X *Ò ±X ÷ ±0 #Ò/2 ±0 ÷ +0 *0 ±0 ÷ -0 *Ò +ý ÷ ±0 +Ò/2 -ý ÷ ±0 -Ò/2 ±0 ÷ +ý *0 ±0 ÷ -ý *Ò FYL2X ±Y * log(±0) Zero Divide ±0 * log(±0) Invalid Operation FYL2XP1 +Y * log(±0+1) *0 -Y * log(±0+1) -0 3.1.3 Infinity The real formats support signed representations of infinities. These values are encoded with a biased exponent of all ones and a significand of 1{}00..00; if the infinity is in a register, it is tagged special. A programmer may code an infinity, or it may be created by the NPX as its masked response to an overflow or a zero divide exception. Note that depending on rounding mode, the masked response may create the largest valid value representable in the destination rather than infinity. The signs of the infinities are observed, and comparisons are possible. Infinities are always interpreted in the affine sense; that is, -ý < (any finite number) < +ý. Arithmetic on infinities is always exact and, therefore, signals no exceptions, except for the invalid operations specified in Table 3-4. Table 3-4. Infinity Operands and Results Key to symbols used in this table X Zero or nonzero positive oprand. Y Nonzero positive operand. * Sign of original infinity operand. -* Compliment of sign of original infinity operand. $ Sign of original operand. # Sign of the original Y operand. Þ Exclusive OR of signs of operands. Operation Operands Result Addition +ý plus +ý +ý -ý plus -ý -ý +ý plus -ý Invalid Operation -ý plus +ý Invalid Operation ±ý plus ±X *ý ±X plus ±ý *ý Subtraction +ý minus -ý +ý -ý minus +ý -ý +ý minus +ý Invalid Operation -ý minus -ý Invalid Operation ±ý minus ±X *ý ±X minus ±ý -*ý Multiplication ±ý * ±ý Þý ±ý * ±Y, ±Y * ±ý Þý ±0 * ±ý, ±ý * ±0 Invalid Operation Division ±ý ÷ ±ý Invalid Operation ±ý ÷ ±X Þý ±X ÷ ±ý Þ0 ±ý ÷ ±0 Þý FSQRT -ý Invalid Operation +ý +ý FPREM, FPREM1 ±ý rem ±ý Invalid Operation ±ý rem ±X Invalid Operation ±X rem ±ý $X, Q = 0 FRNDINT ±ý *ý FSCALE ±ý scaled by --ý Invalid Operation ±ý scaled by +ý *ý ±ý scaled by ±X *ý ±0 scaled by -ý ±0 Sign of original zero operand. ±0 scaled by ýI Invalid Operation ±Y scaled by +ý #ý ±Y scaled by -ý #0 FXTRACT ±ý ST = *ý, ST(1) = +ý Compare +ý : +ý +ý = +ý -ý : -ý -ý = -ý +ý : -ý +ý > -ý -ý : +ý -ý < +ý +ý : ±X +ý > X -ý : ±X -ý < X ±X : +ý X < +ý ±X : -ý X > +ý FTST +ý +ý > 0 -ý -ý < 0 FPATAN ±ý ÷ ±X *Ò/2 ±Y ÷ +ý #0 ±Y ÷ -ý #Ò ±ý ÷ +ý *Ò/4 ±ý ÷ -ý *3Ò/4 ±ý ÷ ±0 *Ò/2 +0 ÷ +ý +0 +0 ÷ -ý +Ò -0 ÷ +ý -0 -0 ÷ -ý -Ò F2XM1 +ý +ý -ý -1 FYL2X, FYL2XP1 ±ý * log(1) Invalid Operation ±ý * log(Y>1) *ý ±ý * log(0 Operand 0 0 0 JA ST < Operand 0 0 1 JB ST = Operand 1 0 0 JE Unordered 1 1 1 JP 4.5.2 FCOMP //source FCOMP (compare real and pop) operates like FCOM, and in addition pops the stack. 4.5.3 FCOMPP FCOMPP (compare real and pop twice) operates like FCOM and additionally pops the stack twice, discarding both operands. FCOMPP always compares ST to ST(1); no operands may be explicitly specified. 4.5.4 FICOM source FICOM (integer compare) converts the source operand, which may reference a word or short binary integer variable, to extended real and compares the stack top to it. The condition code bits in the status word are set as for FCOM. 4.5.5 FICOMP source FICOMP (integer compare and pop) operates identically to FICOM and additionally discards the value in ST by popping the NPX stack. 4.5.6 FTST FTST (test) tests the top stack element by comparing it to zero. The result is posted to the condition codes as shown in Table 4-7. Table 4-7. Condition Code Resulting from FTST 83086 Order C3 (ZF) C2 (ZF) C0 (ZF) Conditional Branch ST > 0.0 0 0 0 JA ST < 0.0 0 0 1 JB ST = 0.0 1 0 0 JE Unordered 1 1 1 JP 4.5.7 FUCOM //source FUCOM (unordered compare real) operates like FCOM, with two differences: 1. It does not cause an invalid-operation exception when one of the operands is a NaN. If either operand is a NaN, the condition bits of the status word are set to unordered as shown in Table 4-6. 2. Only operands on the NPX stack can be compared. 4.5.8 FUCOMP //source FUCOMP (unordered compare real and pop) operates like FUCOM and in addition pops the NPX stack. 4.5.9 FUCOMPP FUCOMPP (unordered compare real and pop) operates like FUCOM and in addition pops the NPX stack twice, discarding both operands. FUCOMPP always compares ST to ST(1); no operands can be explicitly specified. 4.5.10 FXAM FXAM (examine) reports the content of the top stack element as positive/negative and NaN, denormal, normal, zero, infinity, unsupported, or empty. Table 4-8 lists and interprets all the condition code values that FXAM generates. 4.6 Transcendental Instructions The instructions in this group (Table 4-9) perform the time-consuming core calculations for all common trigonometric, inverse trigonometric, hyperbolic, inverse hyperbolic, logarithmic, and exponential functions. The transcendentals operate on the top one or two stack elements, and they return their results to the stack. The trigonometric operations assume their arguments are expressed in radians. The logarithmic and exponential operations work in base 2. The results of transcendental instructions are highly accurate. The absolute value of the relative error of the transcendental instructions is guaranteed to be less than 2^(-62). (Relative error is the ratio between the absolute error and the exact value.) The trigonometric functions accept a practically unrestricted range of operands, whereas the other transcendental instructions require that arguments be more restricted in range. FPREM or FPREM1 may be used to bring the otherwise valid operand of a periodic function into range. Prologue and epilogue software may be used to reduce arguments for other instructions to the expected range and to adjust the result to correspond to the original arguments if necessary. The instruction descriptions in this section document the allowed operand range for each instruction. Table 4-8. Condition Code Defining Operand Class C3 C2 C1 C0 Value at TOP 0 0 0 0 +Unsupported 0 0 0 1 +NaN 0 0 1 0 -Unsupported 0 0 1 1 -NaN 0 1 0 0 +Normal 0 1 0 1 +Infinity 0 1 1 0 -Normal 0 1 1 1 -Infinity 1 0 0 0 +0 1 0 0 1 +Empty 1 0 1 0 -0 1 0 1 1 -Empty 1 1 0 0 +Denormal 1 1 1 0 -Denormal Table 4-9. Transcendental Instructions FSIN Sine FCOS Cosine FSINCOS Sine and cosine FPTAN Tangent of ST FPATAN Arctangent of ST(1)/ST F2XM1 2{X-1} FYL2X Y * log{2}X; Y is ST(1), X is ST FYL2XP1 Y * log{2}(X + 1); Y is ST(1), X is ST 4.6.1 FCOS When complete, this function replaces the contents of ST with COS(ST). ST, expressed in radians, must lie in the range Ú < 2^(63) (for most practical purposes unrestricted). If ST is in range, C2 of the status word is cleared and the result of the operation is produced. If the operand is outside of the range, C2 is set to one (function incomplete) and ST remains intact (i.e., no reduction of the operand is performed). It is the programmers responsibility to reduce the operand to an absolute value smaller than 2^(63). The instructions FPREM1 and FPREM are available for this purpose. 4.6.2 FSIN When complete, this function replaces the contents of ST with SIN(ST). FSIN is equivalent to FCOS in the way it reduces the operand. ST is expressed in radians. 4.6.3 FSINCOS When complete, this instruction replaces the contents of ST with SIN(ST), then pushes COS(ST) onto the stack. (ST(7) must be empty to avoid an invalid exception.) FSINCOS is equivalent to FCOS in the way it reduces the operand. ST is expressed in radians. 4.6.4 FPTAN When complete, FPTAN (partial tangent) computes the function Y = TAN (ST). ST is expressed in radians. Y replaces ST, then the value 1 is pushed, becoming the new stack top. (ST(7) must be empty to avoid an invalid exception.) When the function is complete ST(1) = TAN (arg) and ST = 1. FPTAN is equivalent to FCOS in the way it reduces the operand. The fact that FPTAN places two results on the stack maintains compatibility with the 8087/80287 and aids the calculation of other trigonometric functions that can be derived from tan via standard trigonometric identities. For example, the cot function is given by this identity: cot x = 1/tan x. Therefore, simply executing the reverse divide instruction FDIVR after FPTAN yields the cot function. 4.6.5 FPATAN FPATAN (arctangent) computes the function Ú = ARCTAN (Y/X). X is taken from ST(0) and Y from ST(1). The instruction pops the NPX stack and returns Ú to the (new) stack top, overwriting the Y operand. The result is expressed in radians. The range of operands is not restricted; however, the range of the result depends on the relationship between the operands according to Table 4-10. The fact that the argument of FPATAN is a ratio aids calculation of other trigonometric functions, including Arcsin and Arccos. These can be derived from Arctan via standard trigonometric identities. For example, the Arcsin function can be easily calculated using this identity: Arcsin x = Arctan (x / ¹(1 - x²)). Thus, to find Arcsin (Y), push Y onto the NPX stack, then calculate X = ¹(1 - Y²), pushing the result X onto the stack. Executing FPATAN then leaves Arcsin (Y) at the top of the stack. 4.6.6 F2XM1 F2XM1 (2 to the X minus 1) calculates the function Y = 2^(X) - 1. X is taken from the stack top and must be in the range -1 ¾ X ¾ 1. The result Y replaces the argument X at the stack top. If the argument is out of range, the results are undefined. This instruction is designed to produce a very accurate result even when X is close to 0. For values of the argument very close in magnitude to 1, a larger error will be incurred. To obtain Y = 2^(X), add 1 to the result delivered by F2XM1. The following formulas show how values other than 2 may be raised to a power of X: 10^(X) = 2^(X * LOG2(10)) e^(X) = 2^(X * LOG2(e)) y^(X) = 2^(X * LOG2(Y)) As shown in the next section, the 80387 has built-in instructions for loading the constants LOG{2}10 and LOG{2}e, and the FYL2X instruction may be used to calculate X*LOG{2}Y. Table 4-10. Results of FPATAN Sign(Y) Sign(X) Y < X? Final Result + + Yes 0 < atan(Y/X) < Ò/4 + + No Ò/4 < atan(Y/X) < Ò/2 + - No Ò/2 < atan(Y/X) < 3 * Ò/4 + - Yes 3 * Ò/4 < atan(Y/X) < Ò - + Yes -Ò/4 < atan(Y/X) < 0 - + No -Ò/2 < atan(Y/X) < -Ò/4 - - No -3 * Ò/4 < atan(Y/X) < -Ò/2 - - Yes -Ò < atan(Y/X) < -3 * Ò/4 4.6.7 FYL2X FYL2X (Y log base 2 of X) calculates the function Z = Y * LOG{2}X. X is taken from the stack top and Y from ST(1). The operands must be in the following ranges: 0 ¾ X < +ý -ý < Y < +ý The instruction pops the NPX stack and returns Z at the (new) stack top, replacing the Y operand. If the operand is out of range (i.e., in negative) the invalid-operation exception occurs. This function optimizes the calculations of log to any base other than two, because a multiplication is always required: LOG{N}x = (LOG{2}N){-1} * LOG{2}x 4.6.8 FYL2XP1 FYL2XP1 (Y log base 2 of (X + 1)) calculates the function Z = Y*LOG{2} (X+1). X is taken from the stack top and must be in the range -(1-SQRT(2)/2) < X <1-SQRT(2)/2. Y is taken from ST(1) and is unlimited in range (-ý < Y < +ý). FYL2XP1 pops the stack and returns Z at the (new) stack top, replacing Y. If the argument is out of range, the results are undefined. This instruction provides improved accuracy over FYL2X when computing the logarithm of a number very close to 1, for example 1 + ¯ where ¯ << 1. Providing ¯ rather than 1 + ¯ as the input to the function allows more significant digits to be retained. Table 4-11. Constant Instructions FLDZ Load + 0.0 FLD1 Load + 1.0 FLDPI Load Ò FLDL2T Load log{2}10 FLDL2E Load log{2}e FLDLG2 Load log{10}2 FLDLN2 Load log{e}2 4.7 Constant Instructions Each of these instructions (Table 4-11) loads (pushes) a commonly used constant onto the stack. (ST(7) must be empty to avoid an invalid exception.) The values have full extended real precision (64 bits) and are accurate to approximately 19 decimal digits. Because an external real constant occupies 10 memory bytes, the constant instructions, which are only two bytes long, save storage and improve execution speed, in addition to simplifying programming. The constants used by these instructions are stored internally in a format more precise even than extended real. When loading the constant, the 80387 rounds the more precise internal constant according the RC (rounding control) bit of the control word. However, in spite of this rounding, the precision exception is not raised (to maintain compatibility). When the rounding control is set to round to nearest on the 80387, the 80387 produces the same constant that is produced by the 80287. 4.7.1 FLDZ FLDZ (load zero) loads (pushes) +0.0 onto the NPX stack. 4.7.2 FLD1 FLD1 (load one) loads (pushes) +1.0 onto the NPX stack. 4.7.3 FLDPI FLDPI (load Ò) loads (pushes) Ò onto the NPX stack. 4.7.4 FLDL2T FLDL2T (load log base 2 of 10) loads (pushes) the value LOG{2}10 onto the NPX stack. 4.7.5 FLDL2E FLDL2E (load log base 2 of e) loads (pushes) the value LOG{2}e onto the NPX stack. 4.7.6 FLDLG2 FLDLG2 (load log base 10 of 2) loads (pushes) the value LOG{10}2 onto the NPX stack. 4.7.7 FLDLN2 FLDLN2 (load log base e of 2) loads (pushes) the value LOG{e}2 onto the NPX stack. 4.8 Processor Control Instructions The processor control instructions are shown in Table 4-12. The instruction FSTSW is commonly used for conditional branching. The remaining instructions are not typically used in calculations; they provide control over the 80387 NPX for system-level activities. These activities include initialization, exception handling, and task switching. As shown in Table 4-12, many of the NPX processor control instructions have two forms of assembler mnemonic: 1. A wait form, where the mnemonic is prefixed only with an F, such as FSTSW. This form checks for unmasked numeric exceptions. 2. A no-wait form, where the mnemonic is prefixed with an FN, such as FNSTSW. This form ignores unmasked numeric exceptions. When the control instruction is coded using the no-wait form of the mnemonic, the ASM386 assembler does not precede the ESC instruction with a wait instruction, and the CPU does not test the ERROR# status line from the NPX before executing the processor control instruction. Only the processor control class of instructions have this alternate no-wait form. All numeric instructions are automatically synchronized by the 80386; the CPU transfers all operands before initiating the next instruction. Because of this automatic synchronization by the 80386, numeric instructions for the 80387 need not be preceded by a CPU wait instruction in order to execute correctly. It should also be noted that the 8087 instructions FENI and FDISI and the 80287 instruction FSETPF perform no function in the 80387. If these opcodes are detected in an 80386/80387 instruction stream, the 80387 performs no specific operation and no internal states are affected. For programmers interested in porting numeric software from 80287 or 8087 environments to the 80386, however, it should be noted that program sections containing these exception-handling instructions are not likely to be completely portable to the 80387. Appendix C and Appendix D contains a more complete description of the differences between the 80387 and the 80287/8087. Table 4-12. Processor Control Instructions FINIT/FNINIT Initialize processor FLDCW Load control word FSTCW/FNSTCW Store control word FSTSW/FNSTSW Store status word FSTSW AX/FNSTSW AX Store status word to AX FCLEX/FNCLEX Clear exceptions FSTENV/FNSTENV Store environment FLDENV Load environment FSAVE/FNSAVE Save state FRSTOR Restore state FINCSTP Increment stack pointer FDECSTP Decrement stack pointer FFREE Free register FNOP No operation FWAIT CPU Wait 4.8.1 FINIT/FNINIT FINIT/FNINIT (initialize processor) sets the 80387 NPX into a known state, unaffected by any previous activity. It sets the control word to its default value 037FH (round to nearest, all exceptions masked, 64 bits of precision), clears the status word, and empties all floating-point stack registers. The no-wait form of this instruction causes the 80387 to abort any previous numeric operations currently executing in the NEU. This instruction performs the functional equivalent of a hardware RESET, with one exception: RESET causes the IM bit of the control word to be reset and the ES and IE bits of the status word to be set as a means of signaling the presence of an 80387; FINIT puts the opposite values in these bits. FINIT checks for unmasked numeric exceptions, FNINIT does not. Note that if FNINIT is executed while a previous 80387 memory-referencing instruction is running, 80387 bus cycles in progress are aborted. This instruction may be necessary to clear the 80387 if a processor-extension segment-overrun exception (interrupt 9) is detected by the CPU. 4.8.2 FLDCW source FLDCW (load control word) replaces the current processor control word with the word defined by the source operand. This instruction is typically used to establish or change the 80387's mode of operation. Note that if an exception bit in the status word is set, loading a new control word that unmasks that exception will activate the ERROR# output of the 80387. When changing modes, the recommended procedure is to first clear any exceptions and then load the new control word. 4.8.3 FSTCW/FNSTCW destination FSTCW/FNSTCW (store control word) writes the processor control word to the memory location defined by the destination. FSTCW checks for unmasked numeric exceptions; FNSTCW does not. 4.8.4 FSTSW/FNSTSW destination FSTSW/FNSTSW (store status word) writes the current value of the 80387 status word to the destination operand in memory. The instruction is used to Ž Implement conditional branching following a comparison, FPREM, or FPREM1 instruction (FSTSW). Ž Invoke exception handlers (by polling the exception bits) in environments that do not use interrupts (FSTSW). FSTSW checks for unmasked numeric exceptions, FNSTSW does not. 4.8.5 FSTSW AX/FNSTSW AX FSTSW AX/FNSTSW AX (store status word to AX) is a special 80387 instruction that writes the current value of the 80387 status word directly into the 80386 AX register. This instruction optimizes conditional branching in numeric programs, where the 80386 CPU must test the condition of various NPX status bits. The waited form FSTSW AX checks for unmasked numeric exceptions, the non-waited form FNSTSW AX does not. When this instruction is executed, the 80386 AX register is updated with the NPX status word before the CPU executes any further instructions. The status stored is that from the completion of the prior ESC instruction. 4.8.6 FCLEX/FNCLEX FCLEX/FNCLEX (clear exceptions) clears all exception flags, the exception status flag and the busy flag in the status word. As a consequence, the 80387's ERROR# line goes inactive. FCLEX checks for unmasked numeric exceptions, FNCLEX does not. 4.8.7 FSAVE/FNSAVE destination FSAVE/FNSAVE (save state) writes the full 80387 state‘‘environment plus register stack‘‘to the memory location defined by the destination operand. Figure 4-1 and Figure 4-2 show the layout of the save area; the size and layout of the save the operating mode of the 80386 (real-address mode or protected mode) and on the operand-size attribute in effect for the instruction (32-bit operand or 16-bit operand). When the 80386 is in virtual-8086 mode, the real-address mode formats are used. Typically the instruction is coded to save this image on the CPU stack. The values in the tag word in memory are determined during the execution of FSAVE/FNSAVE. If the tag in the status register indicates that the corresponding register is nonempty, the 80387 examines the data in the register and stores the appropriate tag in memory. Thus the tag that is stored always reflects the actual content of the register. FNSAVE delays its execution until all NPX activity completes normally. Thus, the save image reflects the state of the NPX following the completion of any running instruction. After writing the state image to memory, FSAVE/FNSAVE initializes the 80387 as if FINIT/FNINIT had been executed. FSAVE/FNSAVE is useful whenever a program wants to save the current state of the NPX and initialize it for a new routine. Three examples are 1. An operating system needs to perform a context switch (suspend the task that had been running and give control to a new task). 2. An exception handler needs to use the 80387. 3. An application task wants to pass a "clean" 80387 to a subroutine. FSAVE checks for unmasked numeric exceptions before executing, FNSAVE does not. Figure 4-1. FSAVE/FRSTOR Memory Layout (32-Bit) 41 23 15 7 0 ‚ÏÏσ+0H Ñ‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘Â+4H Ñ‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘ ‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘Â+8H Ñ‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘ ENVIRONMENT ‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘Â+CH Ñ‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘ ‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘Â+10H Ñ‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘Â+14H Ñ‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘‘Â+18H „ÏÏÏ… ‚ÐЃ ST(0)€SIGNEXPONENT SIGNIFICAND €+1CH ST(1)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+26H ST(2)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+30H ST(3)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+3AH ST(4)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+44H ST(5)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+4EH ST(6)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+58H ST(7)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+62H „¤¤… 79 78 64 63 0 Figure 4-2. FSAVE/FRSTOR Memory Layout (16-Bit) 15 7 0 ‚σ+0H Ñ‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘Â+2H Ñ‘‘‘‘‘‘ ‘‘‘‘‘‘‘Â+4H Ñ‘‘‘ ENVIRONMENT ‘‘‘‘Â+6H Ñ‘‘‘‘‘‘ ‘‘‘‘‘‘‘Â+8H Ñ‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘Â+AH Ñ‘‘‘‘‘‘‘‘‘|‘‘‘‘‘‘‘‘‘‘Â+CH „Ï… ‚ÐЃ ST(0)€SIGNEXPONENT SIGNIFICAND €+EH ST(1)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+18H ST(2)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+22H ST(3)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+2CH ST(4)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+36H ST(5)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+40H ST(6)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+4AH ST(7)Ñ‘‘‘š‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Â+54H „¤¤… 79 78 64 63 0 4.8.8 FRSTOR source FRSTOR (restore state) reloads the 80387 state from the memory area defined by the source operand. This information should have been written by a previous FSAVE/FNSAVE instruction and not altered by any other instruction. FRSTOR automatically waits checking for interrupts until all data transfers are completed before continuing to the next instruction. Note that the 80387 "reacts" to its new state at the conclusion of the FRSTOR. It generates an exception request, for example, if the exception and mask bits in the memory image so indicate when the next WAIT or exception-checking ESC instruction is executed. 4.8.9 FSTENV/FNSTENV destination FSTENV/FNSTENV (store environment) writes the 80387's basic status‘‘control, status, and tag words, and exception pointers‘‘to the memory location defined by the destination operand. Typically, the environment is saved on the CPU stack. FSTENV/FNSTENV is often used by exception handlers because it provides access to the exception pointers that identify the offending instruction and operand. After saving the environment, FSTENV/FNSTENV sets all exception masks in the 80387 control word (i.e., masks all exceptions). FSTENV checks for pending exceptions before executing, FNSTENV does not. Figures 4-3 through 4-6 show the format of the environment data in memory; the size and layout of the save area depends on the operating mode of the 80386 (real-address mode or protected mode) and on the operand-size attribute in effect for the instruction (32-bit operand or 16-bit operand). When the 80386 is in virtual-8086 mode, the real-address mode formats are used. FNSTENV does not store the environment until all NPX activity has completed. Thus, the data saved by the instruction reflects the 80387 after any previously decoded instruction has been executed. The values in the tag word in memory are determined during the execution of FNSTENV/FSTENV. If the tag in the status register indicates that the corresponding register is nonempty, the 80387 examines the data in the register and stores the appropriate tag in memory. Thus the tag that is stored always reflects the actual content of the register. Figure 4-3. Protected Mode 80387 Environment, 32-Bit Format 32-BIT PROTECTED MODE FORMAT 31 23 15 7 0 ‚ÏÐσ € RESERVED CONTROL WORD €0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED STATUS WORD €4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED TAG WORD €8H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € IP OFFSET €CH Ñ‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € 0 0 0 0 0 OPCODE 10..0 CS SELECTOR €10H Ñ‘‘‘‘‘‘‘‘‘™‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € DATA OPERAND OFFSET €14H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED OPERAND SELECTOR €18H „Ϥυ Figure 4-4. Real Mode 80387 Environment, 32-Bit Format 32-BIT PROTECTED MODE FORMAT 31 23 15 7 0 ‚ÏÐσ € RESERVED CONTROL WORD €0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED STATUS WORD €4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED TAG WORD €8H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED INSTRUCTION POINTER 15..0 €CH Ñ‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘˜‘˜‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € 0 0 0 0 INSTRUCTION POINTER 31..16 0 OPCODE 10..0 €10H Ñ‘‘‘‘‘‘‘‘™‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘™‘™‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € RESERVED OPERAND POINTER 15..0 €14H Ñ‘‘‘‘‘‘‘‘˜‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘˜‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € 0 0 0 0 OPERAND POINTER 31..16 0 0 0 0 0 0 0 0 0 0 0 0€18H „¤Ï¤¤Ï… Figure 4-5. Protected Mode 80387 Environment, 16-Bit Format 16-BIT PROTECTED MODE FORMAT 15 7 0 ‚σ € CONTROL WORD € 0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € STATUS WORD € 2H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € TAG WORD € 4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € IP OFFSET € 6H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € CB SELECTOR € 8H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € OPERAND OFFSET € AH Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € OPERAND SELECTOR € CH „Ï… Figure 4-6. Real Mode 80387 Environment, 16-Bit Format 16-BIT REAL-ADDRESS MODE AND VIRTUAL-8086 MODE FORMAT 15 7 0 ‚σ € CONTROL WORD € 0H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € STATUS WORD € 2H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € TAG WORD € 4H Ñ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € INSTRUCTION POINTER 15..0 € 6H Ñ‘‘‘‘‘‘‘‘˜‘˜‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ €IP 19..160 OPCODE 10..0 € 8H Ñ‘‘‘‘‘‘‘‘™‘™‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ € OPERAND POINTER 15..0 € AH Ñ‘‘‘‘‘‘‘‘˜‘˜‘‘‘‘š‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ €OP 19..1600 0 0 0 0 0 0 0 0 0 0€ CH „¤¤Ï… 4.8.10 FLDENV source FLDENV (load environment) reloads the environment from the memory area defined by the source operand. This data should have been written by a previous FSTENV/FNSTENV instruction. CPU instructions (that do not reference the environment image) may immediately follow FLDENV. FLDENV automatically waits for all data transfers to complete before executing the next instruction. Note that loading an environment image that contains an unmasked exception causes a numeric exception when the next WAIT or exception-checking ESC instruction is executed. 4.8.11 FINCSTP FINCSTP (increment NPX stack pointer) adds 1 to the stack top pointer (TOP) in the status word. It does not alter tags or register contents, nor does it transfer data. It is not equivalent to popping the stack, because it does not set the tag of the previous stack top to empty. Incrementing the stack pointer when ST=7 produces ST=0. 4.8.12 FDECSTP FDECSTP (decrement NPX stack pointer) subtracts 1 from ST, the stack top pointer in the status word. No tags or registers are altered, nor is any data transferred. Executing FDECSTP when ST=0 produces ST=7. 4.8.13 FFREE destination FFREE (free register) changes the destination register's tag to empty; the content of the register is unaffected. 4.8.14 FNOP FNOP (no operation) effectively performs no operation. 4.8.15 FWAIT (CPU Instruction) FWAIT is not actually an 80387 instruction, but an alternate mnemonic for the 80386 WAIT instruction. The FWAIT or WAIT mnemonic should be coded whenever the programmer wants to check for a pending error before modifying a variable used in the previous floating-point instruction. Coding an FWAIT instruction after an 80387 instruction ensures that unmasked numeric exceptions occur and exception handlers are invoked before the next instruction has a chance to examine the results of the 80387 instruction. More information on when to code an FWAIT instruction is given in Chapter 5 in the section "Concurrent Processing with the 80387." Chapter 5 Programming Numeric Applications ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ 5.1 Programming Facilities As described previously, the 80387 NPX is programmed simply as an extension of the 80386 CPU. This section describes how programmers in ASM386 and in a variety of higher-level languages can work with the 80387. The level of detail in this section is intended to give programmers a basic understanding of the software tools that can be used with the 80387, but this information does not document the full capabilities of these facilities. Complete documentation is available with each program development product. 5.1.1 High-Level Languages For programmers using high-level languages, the programming and operation of the NPX is handled automatically by the compiler. A variety of Intel high-level languages are available that automatically make use of the 80387 NPX when appropriate. These languages include C-386 and PL/M-386. In addition many high-level language compilers are available from independent software vendors. Each of these high-level languages has special numeric libraries allowing programs to take advantage of the capabilities of the 80387 NPX. No special programming conventions are necessary to make use of the 80387 NPX when programming numeric applications in any of these languages. Programmers in PL/M-386 and ASM386 can also make use of many of these library routines by using routines contained in the 80387 Support Library. These libraries implement many of the functions provided by higher-level languages, including exception handlers, ASCII-to-floating-point conversions, and a more complete set of transcendental functions than that provided by the 80387 instruction set. 5.1.2 C Programs C programmers automatically cause the C compiler to generate 80387 instructions when they use the double and float data types. The float type corresponds to the 80387's single real format; the double type corresponds to the 80387's double real format. The statement #include causes mathematical functions such as sin and sqrt to return values of type double. Figure 5-1 illustrates the ease with which C programs interface with the 80387. Figure 5-1. Sample C-386 Program XENIX286 C386 COMPILER, V0.2 COMPILATION OF MODULE SAMPLE OBJECT MODULE PLACED IN sample.obj COMPILER INVOKED BY: c386 sample.c stmt level 1 /****************************************************** 2 * * 3 * SAMPLE C PROGRAM * 4 * * 5 ******************************************************/ 6 7 /** Include /usr/include/stdio.h if necessary **/ 8 /** Include math declarations for transcendenatals and others **/ 9 10 #include 36 #define PI 3.141592654 37 38 main() 39 { 40 1 double sin_result, cos_result; 41 1 double angle_deg = 0.0, angle_rad; 42 1 int i, no_of_trial = 4; 43 44 1 for( i = 1; i <= no_of_trial; i++){ 45 2 angle_rad = angle_deg * PI / 180.0; 46 2 sin_result = sin (angle_rad); 47 2 cos_result = cos (angle_rad); 48 2 printf("sine of %f degrees equals %f\n", angle_deg, sin_result); 49 2 printf("cosine of %f degrees equals %f\n\n", angle_deg, cos_result); 50 2 angle_deg = angle_deg + 30.0; 51 2 } 52 1 /** etc. **/ 53 1 } C386 COMPILATION COMPLETE. 0 WARNINGS, 0 ERRORS 5.1.3 PL/M-386 Programmers in PL/M-386 can access a very useful subset of the 80387's numeric capabilities. The PL/M-386 REAL data type corresponds to the NPX's single real (32-bit) format. This data type provides a range of about 8.43 * 10^(-37) ¾ X ¾ 3.38 * 10^(38), with about seven significant decimal digits. This representation is adequate for the data manipulated by many microcomputer applications. The utility of the REAL data type is extended by the PL/M-386 compiler's practice of holding intermediate results in the 80387's extended real format. This means that the full range and precision of the processor are utilized for intermediate results. Underflow, overflow, and rounding exceptions are most likely to occur during intermediate computations rather than during calculation of an expression's final result. Holding intermediate results in extended-precision real format greatly reduces the likelihood of overflow and underflow and eliminates roundoff as a serious source of error until the final assignment of the result is performed. The compiler generates 80387 code to evaluate expressions that contain REAL data types, whether variables or constants or both. This means that addition, subtraction, multiplication, division, comparison, and assignment of REALs will be performed by the NPX. INTEGER expressions, on the other hand, are evaluated on the CPU. Five built-in procedures (Table 5-1) give the PL/M-386 programmer access to 80387 functions manipulated by the processor control instructions. Prior to any arithmetic operations, a typical PL/M-386 program will set up the NPX using the INIT$REAL$MATH$UNIT procedure and then issue SET$REAL$MODE to configure the NPX. SET$REAL$MODE loads the 80387 control word, and its 16-bit parameter has the format shown for the control word in Chapter 2. The recommended value of this parameter is 033EH (round to nearest, 64-bit precision, all exceptions masked except invalid operation). Other settings may be used at the programmer's discretion. If any exceptions are unmasked, an exception handler must be provided in the form of an interrupt procedure that is designated to be invoked via CPU interrupt vector number 16. The exception handler can use the GET$REAL$ERROR procedure to obtain the low-order byte of the 80387 status word and to then clear the exception flags. The byte returned by GET$REAL$ERROR contains the exception flags; these can be examined to determine the source of the exception. The SAVE$REAL$STATUS and RESTORE$REAL$STATUS procedures are provided for multitasking environments where a running task that uses the 80387 may be preempted by another task that also uses the 80387. It is the responsibility of the operating system to issue SAVE$REAL$STATUS before it executes any statements that affect the 80387; these include the INIT$REAL$MATH$UNIT and SET$REAL$MODE procedures as well as arithmetic expressions. SAVE$REAL$STATUS saves the 80387 state (registers, status, and control words, etc.) on the CPU's stack. RESTORE$REAL$STATUS reloads the state information; the preempting task must invoke this procedure before terminating in order to restore the 80387 to its state at the time the running task was preempted. This enables the preempted task to resume execution from the point of its preemption. Table 5-1. PL/M-386 Built-In Procedures Procedure 80387 Description Instruction INIT$REAL$MATH$UNIT FINIT Initialize processor. SET$REAL$MODE FLDCW Set exception masks, rounding precision, and infinity controls. GET$REAL$ERROR FNSTSW Store, then clear, exception flags. & FNCLEX SAVE$REAL$STATUS FNSAVE Save processor state. RESTORE$REAL$STATUS FRSTOR Restore processor state. 5.1.4 ASM386 The ASM386 assembly language provides programmers with complete access to all of the facilities of the 80386 and 80387 processors. The programmer's view of the 80386/80387 hardware is a single machine with these resources: Ž 160 instructions Ž 12 data types Ž 8 general registers Ž 6 segment registers Ž 8 floating-point registers, organized as a stack 5.1.4.1 Defining Data The ASM386 directives shown in Table 5-2 allocate storage for 80387 variables and constants. As with other storage allocation directives, the assembler associates a type with any variable defined with these directives. The type value is equal to the length of the storage unit in bytes (10 for DT, 8 for DQ, etc.). The assembler checks the type of any variable coded in an instruction to be certain that it is compatible with the instruction. For example, the coding FIADD ALPHA will be flagged as an error if ALPHA's type is not 2 or 4, because integer addition is only available for word and short integer (doubleword) data types. The operand's type also tells the assembler which machine instruction to produce; although to the programmer there is only an FIADD instruction, a different machine instruction is required for each operand type. On occasion it is desirable to use an instruction with an operand that has no declared type. For example, if register BX points to a short integer variable, a programmer may want to code FIADD [BX]. This can be done by informing the assembler of the operand's type in the instruction, coding FIADD DWORD PTR [BX]. The corresponding overrides for the other storage allocations are WORD PTR, QWORD PTR, and TBYTE PTR. The assembler does not, however, check the types of operands used in processor control instructions. Coding FRSTOR [BP] implies that the programmer has set up register BP to point to the location (probably in the stack) where the processor's 94-byte state record has been previously saved. The initial values for 80387 constants may be coded in several different ways. Binary integer constants may be specified as bit strings, decimal integers, octal integers, or hexadecimal strings. Packed decimal values are normally written as decimal integers, although the assembler will accept and convert other representations of integers. Real values may be written as ordinary decimal real numbers (decimal point required), as decimal numbers in scientific notation, or as hexadecimal strings. Using hexadecimal strings is primarily intended for defining special values such as infinities, NaNs, and denormalized numbers. Most programmers will find that ordinary decimal and scientific decimal provide the simplest way to initialize 80387 constants. Figure 5-2 compares several ways of setting the various 80387 data types to the same initial value. Note that preceding 80387 variables and constants with the ASM386 EVEN directive ensures that the operands will be word-aligned in memory. The best performance is obtained when data transfers are double-word aligned. All 80387 data types occupy integral numbers of words so that no storage is "wasted" if blocks of variables are defined together and preceded by a single EVEN declarative. Table 5-2. ASM386 Storage Allocation Directives Directive Interpretation Data Types DW Define Word Word integer DD Define Doubleword Short integer, short real DQ Dfine Quadword Long integer, long real DT Define Tenbyte Packed decimal, temporary real Figure 5-2. Sample 80387 Constants ; THE FOLLOWING ALL ALLOCATE THE CONSTANT: -126 ; NOTE TWO'S COMPLETE STORAGE OF NEGATIVE BINARY INTEGERS. ; ; EVEN ; FORCE WORD ALIGNMENT WORD_INTEGER DW 111111111000010B ; BIT STRING SHORT_INTEGER DD 0FFFFFF82H ; HEX STRING MUST START ; WITH DIGIT LONG_INTEGER DQ -126 ; ORDINARY DECIMAL SINGLE_REAL DD -126.0 ; NOTE PRESENCE OF '.' DOUBLE_REAL DD -1.26E2 ; "SCIENTIFIC" PACKED_DECIMAL DT -126 ; ORDINARY DECIMAL INTEGER ; ; IN THE FOLLOWING, SIGN AND EXPONENT IS 'C005' ; SIGNIFICAND IS '7E00...00', 'R' INFORMS ASSEMBLER THAT ; THE STRING REPRESENTS A REAL DATA TYPE. ; EXTENDED_REAL DT 0C0057E00000000000000R ; HEX STRING 5.1.4.2 Records and Structures The ASM386 RECORD and STRUC (structure) declaratives can be very useful in NPX programming. The record facility can be used to define the bit fields of the control, status, and tag words. Figure 5-3 shows one definition of the status word and how it might be used in a routine that polls the 80387 until it has completed an instruction. Because structures allow different but related data types to be grouped together, they often provide a natural way to represent "real world" data organizations. The fact that the structure template may be "moved" about in memory adds to its flexibility. Figure 5-4 shows a simple structure that might be used to represent data consisting of a series of test score samples. A structure could also be used to define the organization of the information stored and loaded by the FSTENV and FLDENV instructions. Figure 5-3. Status Word Record Definition ; RESERVE SPACE FOR STATUS WORD STATUS_WORD ; LAY OUT STATUS WORD FIELDS STATUS RECORD & BUSY: 1, & COND_CODE3: 1, & STACK_TOP: 3, & COND_CODE2: 1, & COND_CODE1: 1, & COND_CODE0: 1, & INT_REQ: 1, & S_FLAG: 1, & P_FLAG: 1, & U_FLAG: 1, & O_FLAG: 1, & Z_FLAG: 1, & D_FLAG: 1, & I_FLAG: 1 ; REDUCE UNTIL COMPLETE REDUCE: FPREM1 FNSTSW STATUS_WORD TEST STATUS_WORD, MASK_COND_CODE2 JNZ REDUCE Figure 5-4. Structure Definition SAMPLE STRUC N_OBS DD ? ; SHORT INTEGER MEAN DQ ? ; DOUBLE REAL MODE DW ? ; WORD INTEGER STD_DEV DQ ? ; DOUBLE REAL ; ARRAY OF OBSERVATIONS -- WORD INTEGER TEST_SCORES DW 1000 DUP (?) SAMPLE ENDS 5.1.4.3 Addressing Methods 80387 memory data can be accessed with any of the memory addressing methods provided by the ModR/M byte and (optionally) the SIB byte. This means that 80387 data types can be incorporated in data aggregates ranging from simple to complex according to the needs of the application. The addressing methods and the ASM386 notation used to specify them in instructions make the accessing of structures, arrays, arrays of structures, and other organizations direct and straightforward. Table 5-3 gives several examples of 80387 instructions coded with operands that illustrate different addressing methods. Table 5-3. Addressing Method Examples Coding Interpretation FIADD ALPHA ALPHA is a simple scalar (mode is direct). FDIVR ALPHA.BETA BETA is a field in a structure that is "overlaid" on ALPHA (mode is direct). FMUL QWORD PTR [BX] BX contains the address of a long real variable (mode is register indirect). FSUB ALPHA [SI] ALPHA is an array and SI contains the offset of an array element from the start of the array (mode is indexed). FILD [BP].BETA BP contains the address of a structure on the CPU stack and BETA is a field in the structure (mode is based). FBLD TBYTE PTR [BX] [DI] BX contains the address of a packed decimal array and DI contains the offset of an array element (mode is based indexed). 5.1.5 Comparative Programming Example Figures 5-5 and 5-6 show the PL/M-386 and ASM386 code for a simple 80387 program, called ARRSUM. The program references an array (X$ARRAY), which contains 0-100 single real values; the integer variable N$OF$X indicates the number of array elements the program is to consider. ARRSUM steps through X$ARRAY accumulating three sums: Ž SUM$X, the sum of the array values Ž SUM$INDEXES, the sum of each array value times its index, where the index of the first element is 1, the second is 2, etc. Ž SUM$SQUARES, the sum of each array element squared (A true program, of course, would go beyond these steps to store and use the results of these calculations.) The control word is set with the recommended values: round to nearest, 64-bit precision, interrupts enabled, and all exceptions masked except invalid operation. It is assumed that an exception handler has been written to field the invalid operation if it occurs, and that it is invoked by interrupt pointer 16. Either version of the program will run on an actual or an emulated 80387 without altering the code shown. The PL/M-386 version of ARRSUM (Figure 5-5) is very straightforward and illustrates how easily the 80387 can be used in this language. After declaring variables, the program calls built-in procedures to initialize the processor (or its emulator) and to load to the control word. The program clears the sum variables and then steps through X$ARRAY with a DO-loop. The loop control takes into account PL/M-386's practice of considering the index of the first element of an array to be 0. In the computation of SUM$INDEXES, the built-in procedure FLOAT converts I+1 from integer to real because the language does not support "mixed mode" arithmetic. One of the strengths of the NPX, of course, is that it does support arithmetic on mixed data types (because all values are converted internally to the 80-bit extended-precision real format). The ASM386 version (Figure 5-6) defines the external procedure INIT387, which makes the different initialization requirements of the processor and its emulator transparent to the source code. After defining the data and setting up the segment registers and stack pointer, the program calls INIT387 and loads the control word. The computation begins with the next three instructions, which clear three registers by loading (pushing) zeros onto the stack. As shown in Figure 5-7, these registers remain at the bottom of the stack throughout the computation while temporary values are pushed on and popped off the stack above them. The program uses the CPU LOOP instruction to control its iteration through X_ARRAY; register ECX, which LOOP automatically decrements, is loaded with N_OF_X, the number of array elements to be summed. Register ESI is used to select (index) the array elements. The program steps through X_ARRAY from back to front, so ESI is initialized to point at the element just beyond the first element to be processed. The ASM386 TYPE operator is used to determine the number of bytes in each array element. This permits changing X_ARRAY to a double-precision real array by simply changing its definition (DD to DQ) and reassembling. Figure 5-7 shows the effect of the instructions in the program loop on the NPX register stack. The figure assumes that the program is in its first iteration, that N_OF_X is 20, and that X_ARRAY(19) (the 20th element) contains the value 2.5. When the loop terminates, the three sums are left as the top stack elements so that the program ends by simply popping them into memory variables. Figure 5-5. Sample PL/M-386 Program XENIX286 PL/M-386 DEBUG X291a COMPILATION OF MODULE ARRAYSUM OBJECT MODULE PLACED IN arraysum.obj COMPILER INVOKED BY: plm386 arraysum.plm /*********************************************************** * * * ARRAYSUM MODDULE * * * ***********************************************************/ 1 array$sum: do; 2 1 declare (sum$x, sum$indexes, sum$squares) real; 3 1 declare x$array(100) real; 4 1 declare (n$of$x, i) integer; 5 1 declare control$387 literally `033eh'; /* Assume x$array and n$of$x are initialized */ 6 1 call init$real$math$unit; 7 1 call set$real$mode(control$387); /* Clear sums */ 8 1 sum$x, sum$indexes, sum$squares = 0.0; /* Loop through array, accumulating sums */ 9 1 do i = 0 to n$of$x - 1; 10 2 sum$x = sum$x + x$array(i); 11 2 sum$indexes = sum$indexes + (x$array(i)*float(i+1)); 12 2 sum$squares = sum$squares + (x$array(i)*x$array(i)); 13 2 end; /* etc. */ 14 1 end array$sum; MODULE INFORMATION: CODE AREA SIZE = 000000A0H 160D CONSTANT AREA SIZE = 00000004H 4D VARIABLE AREA SIZE = 000001A4H 420D MAXIMUM STACK SIZE = 00000004H 4D 32 LINES READ 0 PROGRAM WARNINGS 0 PROGRAM ERRORS DICTIONARY SUMMARY: 8KB MEMORY USED 0KB DISK SPACE USED END OF PL/M-386 COMPILATION Figure 5-6. Sample ASM386 Program XENIX286 80386 MACRO ASSEMBLER V1.0, ASSEMBLY OF MODULE ARRAYSUM OBJECT MODULE PLACED IN arraysum.obj ASSEMBLER INVOKED BY: asm386 arraysum.asm LOC OBJ LINE SOURCE 1 name arraysum 2 3 ; Define initialization routine 4 5 extrn init387:far 6 7 ; Allocate space for data 8 -------- 9 data segment rw public 00000000 3E03 10 control_387 dw 033eh 00000002 ???????? 11 n_of_x dd ? 00000006 (100 12 x_array cd 100 dup (?) ???????? ) 00000196 ???????? 13 sum_squares dd ? 0000019A ???????? 14 sum_indexes dd ? 0000019E ???????? 15 sum_x dd ? -------- 16 data ends 17 18 ; Allocate CPU stack space 19 -------- 20 stack stackseg 400 21 22 ; Begin code 23 -------- 24 code segment er public 25 26 assume ds:data, ss:stack 27 00000000 28 start: 00000000 66B8---- R 29 mov ax, data 00000004 8ED8 30 mov ds, ax 00000006 66B8---- R 31 mov ax, stack 0000000A B800000000 32 mov eax, 0h 0000000F 8E00 33 mov ss, ax 00000011 BC00000000 R 34 mov esp, stackstart stack 35 36 ; Assume x_array and n_of_x have 37 ; been initialized 38 39 ; Prepare the 80387 or its emulator 40 00000016 9A00000000---- E 41 call init387 0000001D D92D00000000 R 42 fldcw control_387 43 44 ; Clear three registers to hold 45 ; running sums 46 00000023 D9EE 47 fldz 00000025 D9EE 48 fldz 00000027 D9EE 49 fldz 50 51 ; Setup ECX as loop counter and ESI 52 ; as index into x array 53 00000029 8B0D02000000 R 54 mov ecx, n of x 0000002F F7E9 55 imul ecx 00000031 8BF0 56 mov esi, eax 57 58 ; ESI now contains index of last 59 ; element + 1 60 ; Loop through x_array and 61 ; accumulate sum 62 00000033 43 sum_next: 64 ; backup one element and push on 65 ; the stack 66 00000033 83EE04 67 sub esi, type x_array 00000036 D98606000000 R 68 fld x_array[esi] 69 70 ; add to the sum and duplicate x 71 ; on the stack 72 0000003C DCC3 73 fadd st(3), st 0000003E D9C0 74 fld st 75 76 ; square it and add into the sum of 77 ; (index+1) and discard 78 00000040 DCC8 79 fmul st, st 00000042 DEC2 80 facdp st(2), st 81 82 ; reduce index for next iteration 83 00000044 FF0D02000000 R 84 dec n_of_x 0000004A E2E7 85 loop sum_next 86 87 ; Pop sums into memory 88 0000004C 89 pop_results: 0000004C D91D96010000 R 90 fstp sum_squares 00000052 D91D9A010000 R 91 fstp sum_indexes 00000058 D91D9E010000 R 92 fstp sum_x 0000005E 9B 93 fwait 94 95 ; 96 ; Etc. 97 ; -------- 98 code ends 99 end start, ds:data, ss:stack ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS. Figure 5-7. Instructions and Register Stack FLDZ, FLDZ, FLDZ FLD X_ARRAY[SI] ‚ƒ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘‚ƒ ST(0)€ 0.0 € SUM_SQUARES ST(O)€ 2.5 € X_ARRAY(19) †‡ †‡ ST(1)€ 0.0 € SUM_INDEXES ST(1)€ € SUM_SQUARES †‡ †‡ ST(2)€ 0.0 € SUM_X ST(2)€ 0.0 € SUM_INDEXES „… †‡ ST(3)€ 0.0 € SUM_X ’ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ „… FADD_ST(3), ST ‘• FLD_ST ‚ƒ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘‚ƒ ST(O)€ 2.5 € X_ARRAY(19) ST(O)€ 2.5 € X_ARRAY(19) †‡ †‡ ST(1)€ 0.0 € SUM_SQUARES ST(1)€ 2.5 € X_ARRAY(19) †‡ †‡ ST(2)€ 0.0 € SUM_INDEXES ST(2)€ 0.0 € SUM_SQUARES †‡ †‡ ST(3)€ 2.5 € SUM_X ST(3)€ 0.0 € SUM_INDEXES „… †‡ ST(4)€ 2.5 € SUM_X ’ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ „… FMUL_ST, ST ‘‘• FADDP_ST(2), ST ‚ƒ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘‚ƒ ST(0)€ 6.25 € X_ARRAY(19)² ST(O)€ 2.5 € X_ARRAY(19) †‡ †‡ ST(1)€ 2.5 € X_ARRAY(19) ST(1)€ 6.25 € SUM_SQUARES †‡ †‡ ST(2)€ 0.0 € SUM_SQUARES ST(2)€ 0.0 € SUM_INDEXES †‡ †‡ ST(3)€ 0.0 € SUM_INDEXES ST(3)€ 2.5 € SUM_X †‡ „… ST(4)€ 2.5 € SUM_X „… ’ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ • FIMUL N_OF_X ‘‘• FADDP_ST(2), ST ‚ƒ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘ ‘‚ƒ ST(O)€ 50.0 € X_ARRAY(19)*20 ST(O)€ 6.25 € SUM_SQUARES †‡ †‡ ST(1)€ 6.25 € SUM_SQUARES ST(1)€ 50.0 € SUM_INDEXES †‡ †‡ ST(2)€ 0.0 € SUM_INDEXES ST(2)€ 2.5 € SUM_X †‡ „… ST(3)€ 2.5 € SUM_X „… 5.1.6 80387 Emulation The programming of applications to execute on both 80386 with an 80387 and 80386 systems without an 80387 is made much easier by the existence of an 80387 emulator for 80386 systems. The Intel EMUL387 emulator offers a complete software counterpart to the 80387 hardware; NPX instructions can be simply emulated in software rather than being executed in hardware. With software emulation, the distinction between 80386 systems with or without an 80387 is reduced to a simple performance differential. Identical numeric programs will simply execute more slowly (using software emulation of NPX instructions) on 80386 systems without an 80387 than on an 80386/80387 system executing NPX instructions directly. When incorporated into the systems software, the emulation of NPX instructions on the 80386 systems is completely transparent to the applications programmer. Applications software needs no special libraries, linking, or other activity to allow it to run on an 80386 with 80387 emulation. To the applications programmer, the development of programs for 80386 systems is the same whether the 80387 NPX hardware is available or not. The full 80387 instruction set is available for use, with NPX instructions being either emulated or executed directly. Applications programmers need not be concerned with the hardware configuration of the computer systems on which their applications will eventually run. For systems programmers, details relating to 80387 emulators are described in Chapter 6. The EMUL387 software emulator for 80386 systems is available from Intel as a separate program product. 5.2 Concurrent Processing With the 80387 Because the 80386 CPU and the 80387 NPX have separate execution units, it is possible for the NPX to execute numeric instructions in parallel with instructions executed by the CPU. This simultaneous execution of different instructions is called concurrency. No special programming techniques are required to gain the advantages of concurrent execution; numeric instructions for the NPX are simply placed in line with the instructions for the CPU. CPU and numeric instructions are initiated in the same order as they are encountered by the CPU in its instruction stream. However, because numeric operations performed by the NPX generally require more time than operations performed by the CPU, the CPU can often execute several of its instructions before the NPX completes a numeric instruction previously initiated. This concurrency offers obvious advantages in terms of execution performance, but concurrency also imposes several rules that must be observed in order to assure proper synchronization of the 80386 CPU and 80387 NPX. All Intel high-level languages automatically provide for and manage concurrency in the NPX. Assembly-language programmers, however, must understand and manage some areas of concurrency in exchange for the flexibility and performance of programming in assembly language. This section is for the assembly-language programmer or well-informed high-level-language programmer. 5.2.1 Managing Concurrency Concurrent execution of the host and 80387 is easy to establish and maintain. The activities of numeric programs can be split into two major areas: program control and arithmetic. The program control part performs activities such as deciding what functions to perform, calculating addresses of numeric operands, and loop control. The arithmetic part simply adds, subtracts, multiplies, and performs other operations on the numeric operands. The NPX and host are designed to handle these two parts separately and efficiently. Concurrency management is required to check for an exception before letting the 80386 change a value just used by the 80387. Almost any numeric instruction can, under the wrong circumstances, produce a numeric exception. For programmers in higher-level languages, all required synchronization is automatically provided by the appropriate compiler. For assembly-language programmers exception synchronization remains the responsibility of the assembly-language programmer. A complication is that a programmer may not expect his numeric program to cause numeric exceptions, but in some systems, they may regularly happen. To better understand these points, consider what can happen when the NPX detects an exception. Depending on options determined by the software system designer, the NPX can perform one of two things when a numeric exception occurs: Ž The NPX can provide a default fix-up for selected numeric exceptions. Programs can mask individual exception types to indicate that the NPX should generate a safe, reasonable result whenever that exception occurs. The default exception fix-up activity is treated by the NPX as part of the instruction causing the exception; no external indication of the exception is given. When exceptions are detected, a flag is set in the numeric status register, but no information regarding where or when is available. If the NPX performs its default action for all exceptions, then the need for exception synchronization is not manifest. However, as will be shown later, this is not sufficient reason to ignore exception synchronization when designing programs that use the 80387. Ž As an alternative to the NPX default fix-up of numeric exceptions, the 80386 CPU can be notified whenever an exception occurs. When a numeric exception is unmasked and the exception occurs, the NPX stops further execution of the numeric instruction and signals this event to the CPU. On the next occurrence of an ESC or WAIT instruction, the CPU traps to a software exception handler. The exception handler can then implement any sort of recovery procedures desired for any numeric exception detectable by the NPX. Some ESC instructions do not check for exceptions. These are the nonwaiting forms FNINIT, FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX. When the NPX signals an unmasked exception condition, it is requesting help. The fact that the exception was unmasked indicates that further numeric program execution under the arithmetic and programming rules of the NPX is unreasonable. If concurrent execution is allowed, the state of the CPU when it recognizes the exception is undefined. The CPU may have changed many of its internal registers and be executing a totally different program by the time the exception occurs. To handle this situation, the NPX has special registers updated at the start of each numeric instruction to describe the state of the numeric program when the failed instruction was attempted. Exception synchronization ensures that the NPX is in a well-defined state after an unmasked numeric exception occurs. Without a well-defined state, it would be impossible for exception recovery routines to determine why the numeric exception occurred, or to recover successfully from the exception. The following two sections illustrate the need to always consider exception synchronization when writing 80387 code, even when the code is initially intended for execution with exceptions masked. If the code is later moved to an environment where exceptions are unmasked, the same code may not work correctly. An example of how some instructions written without exception synchronization will work initially, but fail when moved into a new environment is shown in Figure 5-8. Figure 5-8. Exception Synchronization Examples INCORRECT ERROR SYNCHRONIZATION FILD COUNT ; NPX instruction INC COUNT ; CPU instruction alters operand FSQRT COUNT ; subsequent NPX instruction -- error from ; previous NPX instruction detected here PROPER ERROR SYNCHRONIZATION FILD COUNT ; NPX instruction FSQRT ; subsequent NPX instruction -- error from ; previous NPX instruction detected here INC COUNT ; CPU instruction alters operand 5.2.1.1 Incorrect Exception Synchronization In Figure 5-8, three instructions are shown to load an integer, calculate its square root, then increment the integer. The 80386-to-80387 interface and synchronous execution of the NPX emulator will allow this program to execute correctly when no exceptions occur on the FILD instruction. This situation changes if the 80387 numeric register stack is extended to memory. To extend the NPX stack to memory, the invalid exception is unmasked. A push to a full register or pop from an empty register sets SF and causes an invalid exception. The recovery routine for the exception must recognize this situation, fix up the stack, then perform the original operation. The recovery routine will not work correctly in the first example shown in the figure. The problem is that the value of COUNT is incremented before the NPX can signal the exception to the CPU. Because COUNT is incremented before the exception handler is invoked, the recovery routine will load an incorrect value of COUNT, causing the program to fail or behave unreliably. 5.2.1.2 Proper Exception Synchronization Exception synchronization relies on the WAIT instruction and the BUSY# and ERROR# signals of the 80387. When an unmasked exception occurs in the 80387, it asserts the ERROR# signal, signaling to the CPU that a numeric exception has occurred. The next time the CPU encounters a WAIT instruction or an exception-checking ESC instruction, the CPU acknowledges the ERROR# signal by trapping automatically to Interrupt #16, the processor-extension exception vector. If the following ESC or WAIT instruction is properly placed, the CPU will not yet have disturbed any information vital to recovery from the exception. Chapter 6 System-Level Numeric Programming ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ System programming for 80387 systems requires a more detailed understanding of the 80387 NPX than does application programming. Such things as emulation, initialization, exception handling, and data and error synchronization are all the responsibility of the systems programmer. These topics are covered in detail in the sections that follow. 6.1 80386/80387 Architecture On a software level, the 80387 NPX appears as an extension of the 80386 CPU. On the hardware level, however, the mechanisms by which the 80386 and 80387 interact are more complex. This section describes how the 80387 NPX and 80386 CPU interact and points out features of this interaction that are of interest to systems programmers. 6.1.1 Instruction and Operand Transfer All transfers of instructions and operands between the 80387 and system memory are performed by the 80386 using I/O bus cycles. The 80387 appears to the CPU as a special peripheral device. It is special in two respects: the CPU initiates I/O automatically when it encounters ESC instructions, and the CPU uses reserved I/O addresses to communicate with the 80387. These I/O operations are completely transparent to software. Because the 80386 actually performs all transfers between the 80387 and memory, no additional bus drivers, controllers, or other components are necessary to interface the 80387 NPX to the local bus. The 80387 can utilize instructions and operands located in any memory accessible to the 80386 CPU. 6.1.2 Independent of CPU Addressing Modes Unlike the 80287, the 80387 is not sensitive to the addressing and memory management of the CPU. The 80387 operates the same regardless of whether the 80386 CPU is operating in real-address mode, in protected mode, or in virtual 8086 mode. The instruction FSETPM that was necessary in 80286/80287 systems to set the 80287 into protected mode is not needed for the 80387. The 80387 treats this instruction as a no-op. Because the 80386 actually performs all transfers between the 80387 and memory, 80387 instructions can utilize any memory location accessible by the task currently executing on the 80386. When operating in protected mode, all references to memory operands are automatically verified by the 80386's memory management and protection mechanisms as for any other memory references by the currently-executing task. Protection violations associated with NPX instructions automatically cause the 80386 to trap to an appropriate exception handler. To the numerics programmer, the operating modes of the 80386 affect only the manner in which the NPX instruction and data pointers are represented in memory following an FSAVE or FSTENV instruction. Each of these instructions produces one of four formats depending on both the operating mode and on the operand-size attribute in effect for the instruction. The differences are detailed in the discussion of the FSAVE and FSTENV instructions in Chapter 4. 6.1.3 Dedicated I/O Locations The 80387 NPX does not require that any memory addresses be set aside for special purposes. The 80387 does make use of I/O port addresses, but these are 32-bit addresses with the high-order bit set (i.e. > 80000000H); therefore, these I/O operations are completely transparent to the 80386 software. Because these addresses are beyond the 64 Kbyte I/O addressing limit of I/O instructions, 80386 programs cannot reference these reserved I/O addresses directly. 6.2 Processor Initialization and Control One of the principal responsibilities of systems software is the initialization, monitoring, and control of the hardware and software resources of the system, including the 80387 NPX. In this section, issues related to system initialization and control are described, including recognition of the NPX, emulation of the 80387 NPX in software if the hardware is not available, and the handling of exceptions that may occur during the execution of the 80387. 6.2.1 System Initialization During initialization of an 80386 system, systems software must Ž Recognize the presence or absence of the NPX. Ž Set flags in the 80386 MSW to reflect the state of the numeric environment. If an 80387 NPX is present in the system, the NPX must be initialized. All of these activities can be quickly and easily performed as part of the overall system initialization. 6.2.2 Hardware Recognition of the NPX The 80386 identifies the type of its coprocessor (80287 or 80387) by sampling its ERROR# input some time after the falling edge of RESET and before executing the first ESC instruction. The 80287 keeps its ERROR# output in inactive state after hardware reset; the 80387 keeps its ERROR# output in active state after hardware reset. The 80386 records this difference in the ET bit of control register zero (CR0). The 80386 subsequently uses ET to control its interface with the coprocessor. If ET is set, it employs the 32-bit protocol of the 80387; if ET is not set, it employs the 16-bit protocol of the 80287. Systems software can (if necessary) change the value of ET. There are three reasons that ET may not be set: 1. An 80287 is actually present. 2. No coprocessor is present. 3. An 80387 is present but it is connected in a nonstandard manner that does not trigger the setting of ET. An example of case three is the PC/AT-compatible design described in Appendix F. In such cases, initialization software may need to change the value of ET. 6.2.3 Software Recognition of the NPX Figure 6-1 shows an example of a recognition routine that determines whether an NPX is present, and distinguishes between the 80387 and the 8087/80287. This routine can be executed on any 80386, 80286, or 8086 hardware configuration that has an NPX socket. The example guards against the possibility of accidentally reading an expected value from a floating data bus when no NPX is present. Data read from a floating bus is undefined. By expecting to read a specific bit pattern from the NPX, the routine protects itself from the indeterminate state of the bus. The example also avoids depending on any values in reserved bits, thereby maintaining compatibility with future numerics coprocessors. Figure 6-1. Software Routine to Recognize the 80287 8086/87/88/186 MACRO ASSEMBLER Test for presence of a Numerics Chip, Revision 1.0 DOS 3.20 (033-N) 8086/87/88/186 MACRO ASSEMBLER V2.0 ASSEMBLY OF MODULE TEST_NPX OBJECT MODULE PLACED IN FINDNPX.OBJ LOC OBJ LINE SOURCE 1 +1 $title('Test for presence of a Numerics Chip, Revision 1.0') 2 3 name Test_NPX 4 ---- 5 stack segment stack 'stack' 0000 (100 6 dw 100 dup (?) ???? ) 00C8 ???? 7 sst dw ? ---- 8 stack ends 9 ---- 10 data segment public 'data' 0000 0000 11 temp dw 0h ---- 12 data ends 13 14 dgroup group data, stack 15 cgroup group code 16 ---- 17 code segment public 'code' 18 assume cs:cgroup, ds:dgroup 19 0000 20 start: 21 ; 22 ; Look for an 8087, 80287, or 80387 NPX. 23 ; Note that we cannot execute WAIT on 8086/88 if no 8087 is present. 24 ; 0000 25 test npx: 0000 90DBE3 26 fninit ; Must use non-wait form 0003 BE0000 R 27 mov [si],offset dgroup:temp 0006 C7045A5A 28 mov word ptr [si],5A5AH ; Initialize temp to non-zero value 000A 90DD3C 29 fnstsw [si] ; Must use non-wait form of fstsw 30 ; It is not necessary to use a WAIT instruction 31 ; after fnstsw or fnstcw. Do not use one here. 000D 803C00 32 cmp byte ptr [si],0 ; See if correct status with zeroes was read 0010 752A 33 jne no_npx ; Jump if not a valid status word, meaning no NPX 34 ; 35 ; Now see if ones can be correctly written from the control word. 36 ; 0012 90D93C 37 fnstcw [si] ; Look at the control word; do not use WAIT form 38 ; Do not use a WAIT instruction here! 0015 8B04 39 mov ax,[si] ; See if ones can be written by NPX 0017 253F10 40 and ax,103fh ; See if selected parts of control word look OK 001A 3D3F00 41 cmp ax,3fh ; Check that ones and zeroes were correctly read 001D 7510 42 jne no npx ; Jump if no NPX is installed 43 ; 44 ; Some numerics chip is installed. NPX instructions and WAIT are now safe. 45 ; See if the NPX is an 8087, 80287, or 80387. 46 ; This code is necessary if a denormal exception handler is used or the 47 ; new 80387 instructions will be used. 48 ; 001F 98D9E8 49 fld1 ; Must use default control word from FNINIT 0022 9BD9EE 50 fldz ; Form infinity 0025 9BDEF9 51 fdiv ; 8087/287 says +inf = .inf 0028 9BD9C0 52 fld st ; Form negative infinity 002B 9BD9E0 53 fchs ; 80387 says +inf <> -inf 002E 9BDED9 54 fcompp ; See if they are the same and remove them 0031 9BDD3C 55 fstsw [si] ; Look at status from FCOMPP 0034 8B04 56 mov ax,[si] 0036 9E 57 sahf ; See if the infinities matched 0037 7406 58 je found_87_287 ; Jump if 8087/287 is present 59 ; 60 ; An 80387 is present. If denormal exceptions are used for an 8087/287, 61 ; they must be masked. The 80387 will automatically normalize denormal 62 ; operands faster than an exception handler can. 63 ; 0039 EB0790 64 jmp found_387 003C 65 no_npx: 66 ; set up for no NPX 67 ; ... 68 ; 003C EB0490 69 jmp exit 003F 70 found_87_287: 71 ; set up for 87/287 72 ; ... 73 ; 003F EB0190 74 jmp exit 0042 75 found_387: 76 ; set up for 387 77 ; ... 78 ; 0042 79 exit: ---- 80 code ends 81 end start,ds:dgroup,ss:dgroup:sst ASSEMBLY COMPLETE, NO ERRORS FOUND 6.2.4 Configuring the Numerics Environment Once the 80386 CPU has determined the presence or absence of the 80387 or 80287 NPX, the 80386 must set either the MP or the EM bit in its own control register zero (CR0) accordingly. The initialization routine can either Ž Set the MP bit in CR0 to allow numeric instructions to be executed directly by the NPX. Ž Set the EM bit in the CR0 to permit software emulation of the numeric instructions. The MP (monitor coprocessor) flag of CR0 indicates to the 80386 whether an NPX is physically available in the system. The MP flag controls the function of the WAIT instruction. When executing a WAIT instruction, the 80386 tests the task switched (TS) bit only if MP is set; if it finds TS set under these conditions, the CPU traps to exception #7. The Emulation Mode (EM) bit of CR0 indicates to the 80386 whether NPX functions are to be emulated. If the CPU finds EM set when it executes an ESC instruction, program control is automatically trapped to exception #7, giving the exception handler the opportunity to emulate the functions of an 80387. For correct 80386 operation, the EM bit must never be set concurrently with MP. The EM and MP bits of the 80386 are described in more detail in the 80386 Programmer's Reference Manual. More information on software emulation for the 80387 NPX is described in the "80387 Emulation" section later in this chapter. In any case, if ESC instructions are to be executed, either the MP or EM bit must be set, but not both. 6.2.5 Initializing the 80387 Initializing the 80387 NPX simply means placing the NPX in a known state unaffected by any activity performed earlier. A single FNINIT instruction performs this initialization. All the error masks are set, all registers are tagged empty, TOP is set to zero, and default rounding and precision controls are set. Table 6-1 shows the state of the 80387 NPX following FINIT or FNINIT. This state is compatible with that of the 80287 after FINIT or after hardware RESET. The FNINIT instruction does not leave the 80387 in the same state as that which results from the hardware RESET signal. Following a hardware RESET signal, such as after initial power-up, the state of the 80387 differs in the following respects: 1. The mask bit for the invalid-operation exception is reset. 2. The invalid-operation exception flag is set. 3. The exception-summary bit is set (along with its mirror image, the B-bit). These settings cause assertion of the ERROR# signal as described previously. The FNINIT instruction must be used to change the 80387 state to one compatible with the 80287. Table 6-1. NPX Processor State Following Initialization Field Value Interpretation Control Word (Infinity Control) The 80387 does not have infinity control. This value is listed to emphasize that programs written for the 80287 may not behave the same on the 80387 if they depend on this bit. 0 Affine Rounding Control 00 Round to nearest Precision Control 11 64 bits Exception Masks 111111 All exceptions masked Status Word (Busy) 0 ‘‘ Condition Code 0000 ‘‘ Stack Top 000 Register 0 is stack top Exception Summary 0 No exceptions Stack Flag 0 ‘‘ Exception Flags 000000 No exceptions Tag Word Tags 11 Empty Registers N.C. Not changed Exception Pointers Instruction Code N.C. Not changed Instruction Address N.C. Not changed Operand Address N.C. Not changed 6.2.6 80387 Emulation If it is determined that no 80387 NPX is available in the system, systems software may decide to emulate ESC instructions in software. This emulation is easily supported by the 80386 hardware, because the 80386 can be configured to trap to a software emulation routine whenever it encounters an ESC instruction in its instruction stream. Whenever the 80386 CPU encounters an ESC instruction, and its MP and EM status bits are set appropriately (MP=0, EM=1), the 80386 automatically traps to interrupt #7, the "processor extension not available" exception. The return link stored on the stack points to the first byte of the ESC instruction, including the prefix byte(s), if any. The exception handler can use this return link to examine the ESC instruction and proceed to emulate the numeric instruction in software. The emulator must step the return pointer so that, upon return from the exception handler, execution can resume at the first instruction following the ESC instruction. To an application program, execution on an 80386 system with 80387 emulation is almost indistinguishable from execution on a system with an 80387, except for the difference in execution speeds. There are several important considerations when using emulation on an 80386 system: Ž When operating in protected mode, numeric applications using the emulator must be executed in execute-readable code segments. Numeric software cannot be emulated if it is executed in execute-only code segments. This is because the emulator must be able to examine the particular numeric instruction that caused the emulation trap. Ž Only privileged tasks can place the 80386 in emulation mode. The instructions necessary to place the 80386 in emulation mode are privileged instructions, and are not typically accessible to an application. An emulator package (EMUL387) that runs on 80386 systems is available from Intel. This emulation package operates in both real and protected mode as well as in virtual 8086 mode, providing a complete functional equivalent for the 80387 emulated in software. When using the EMUL387 emulator, writers of numeric exception handlers should be aware of one slight difference between the emulated 80387 and the 80387 hardware: Ž On the 80387 hardware, exception handlers are invoked by the 80386 at the first WAIT or ESC instruction following the instruction causing the exception. The return link, stored on the 80386 stack, points to this second WAIT or ESC instruction where execution will resume following a return from the exception handler. Ž Using the EMUL387 emulator, numeric exception handlers are invoked from within the emulator itself. The return link stored on the stack when the exception handler is invoked will therefore point back to the EMUL387 emulator, rather than to the program code actually being executed (emulated). An IRET return from the exception handler returns to the emulator, which then returns immediately to the emulated program. This added layer of indirection should not cause confusion, however, because the instruction causing the exception can always be identified from the 80387's instruction and data pointers. 6.2.7 Handling Numerics Exceptions Once the 80387 has been initialized and normal execution of applications has been commenced, the 80387 NPX may occasionally require attention in order to recover from numeric processing exceptions. This section provides details for writing software exception handlers for numeric exceptions. Numeric processing exceptions have already been introduced in Chapter 3. The 80387 NPX can take one of two actions when it recognizes a numeric exception: Ž If the exception is masked, the NPX will automatically perform its own masked exception response, correcting the exception condition according to fixed rules, and then continuing with its instruction execution. Ž If the exception is unmasked, the NPX signals the exception to the 80386 CPU using the ERROR# status line between the two processors. Each time the 80386 encounters an ESC or WAIT instruction in its instruction stream, the CPU checks the condition of this ERROR# status line. If ERROR# is active, the CPU automatically traps to Interrupt vector #16, the Processor Extension Error trap. Interrupt vector #16 typically points to a software exception handler, which may or may not be a part of systems software. This exception handler takes the form of an 80386 interrupt procedure. When handling numeric errors, the CPU has two responsibilities: Ž The CPU must not disturb the numeric context when an error is detected. Ž The CPU must clear the error and attempt recovery from the error. Although the manner in which programmers may treat these responsibilities varies from one implementation to the next, most exception handlers will include these basic steps: Ž Store the NPX environment (control, status, and tag words, operand and instruction pointers) as it existed at the time of the exception. Ž Clear the exception bits in the status word. Ž Enable interrupts on the CPU. Ž Identify the exception by examining the status and control words in the saved environment. Ž Take some system-dependent action to rectify the exception. Ž Return to the interrupted program and resume normal execution. 6.2.8 Simultaneous Exception Response In cases where multiple exceptions arise simultaneously, the 80387 signals one exception according to the precedence shown at the end of Chapter 3. This means, for example, that an SNaN divided by zero results in an invalid operation, not in a zero divide exception. 6.2.9 Exception Recovery Examples Recovery routines for NPX exceptions can take a variety of forms. They can change the arithmetic and programming rules of the NPX. These changes may redefine the default fix-up for an error, change the appearance of the NPX to the programmer, or change how arithmetic is defined on the NPX. A change to an exception response might be to automatically normalize all denormals loaded from memory. A change in appearance might be extending the register stack into memory to provide an "infinite" number of numeric registers. The arithmetic of the NPX can be changed to automatically extend the precision and range of variables when exceeded. All these functions can be implemented on the NPX via numeric exceptions and associated recovery routines in a manner transparent to the application programmer. Some other possible application-dependent actions might include: Ž Incrementing an exception counter for later display or printing Ž Printing or displaying diagnostic information (e.g., the 80387 environment andregisters) Ž Aborting further execution Ž Storing a diagnostic value (a NaN) in the result and continuing with the computation Notice that an exception may or may not constitute an error, depending on the application. Once the exception handler corrects the condition causing the exception, the floating-point instruction that caused the exception can be restarted, if appropriate. This cannot be accomplished using the IRET instruction, however, because the trap occurs at the ESC or WAIT instruction following the offending ESC instruction. The exception handler must obtain (using FSAVE or FSTENV) the address of the offending instruction in the task that initiated it, make a copy of it, execute the copy in the context of the offending task, and then return via IRET to the current CPU instruction stream. In order to correct the condition causing the numeric exception, exception handlers must recognize the precise state of the NPX at the time the exception handler was invoked, and be able to reconstruct the state of the NPX when the exception initially occurred. To reconstruct the state of the NPX, programmers must understand when, during the execution of an NPX instruction, exceptions are actually recognized. Invalid operation, zero divide, and denormalized exceptions are detected before an operation begins, whereas overflow, underflow, and precision exceptions are not raised until a true result has been computed. When a before exception is detected, the NPX register stack and memory have not yet been updated, and appear as if the offending instructions has not been executed. When an after exception is detected, the register stack and memory appear as if the instruction has run to completion; i.e., they may be updated. (However, in a store or store-and-pop operation, unmasked over/underflow is handled like a before exception; memory is not updated and the stack is not popped.) The programming examples contained in Chapter 7 include an outline of several exception handlers to process numeric exceptions for the 80387. Chapter 7 Numeric Programming Examples ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ The following sections contain examples of numeric programs for the 80387 NPX written in ASM386. These examples are intended to illustrate some of the techniques for programming the 80386/80387 computing system for numeric applications. 7.1 Conditional Branching Example As discussed in Chapter 2, several numeric instructions post their results to the condition code bits of the 80387 status word. Although there are many ways to implement conditional branching following a comparison, the basic approach is as follows: Ž Execute the comparison. Ž Store the status word. (80387 allows storing status directly into AX register.) Ž Inspect the condition code bits. Ž Jump on the result. Figure 7-1 is a code fragment that illustrates how two memory-resident double-format real numbers might be compared (similar code could be used with the FTST instruction). The numbers are called A and B, and the comparison is A to B. The comparison itself requires loading A onto the top of the 80387 register stack and then comparing it to B, while popping the stack with the same instruction. The status word is then written into the 80386 AX register. A and B have four possible orderings, and bits C3, C2, and C0 of the condition code indicate which ordering holds. These bits are positioned in the upper byte of the NPX status word so as to correspond to the CPU's zero, parity, and carry flags (ZF, PF, and CF), when the byte is written into the flags. The code fragment sets ZF, PF, and CF of the CPU status word to the values of C3, C2, and C0 of the NPX status word, and then uses the CPU conditional jump instructions to test the flags. The resulting code is extremely compact, requiring only seven instructions. The FXAM instruction updates all four condition code bits. Figure 7-2 shows how a jump table can be used to determine the characteristics of the value examined. The jump table (FXAM_TBL) is initialized to contain the 32-bit displacement of 16 labels, one for each possible condition code setting. Note that four of the table entries contain the same value, "EMPTY." The first two condition code settings correspond to "EMPTY." The two other table entries that contain "EMPTY" will never be used on the 80387, but may be used if the code is executed with an 80287. The program fragment performs the FXAM and stores the status word. It then manipulates the condition code bits to finally produce a number in register BX that equals the condition code times 2. This involves zeroing the unused bits in the byte that contains the code, shifting C3 to the right so that it is adjacent to C2, and then shifting the code to multiply it by 2. The resulting value is used as an index that selects one of the displacements from FXAM_TBL (the multiplication of the condition code is required because of the 2-byte length of each value in FXAM_TBL). The unconditional JMP instruction effectively vectors through the jump table to the labeled routine that contains code (not shown in the example) to process each possible result of the FXAM instruction. Figure 7-1. Conditional Branching for Compares . . . A DQ ? B DQ ? . . . FLD A ; LOAD A ONTO TOP OF 387 STACK FCOMP B ; COMPARE A:B, POP A FSTSW AX ; STORE RESULT TO CPU AX REGISTER ; ; CPU AX REGISTER CONTAINS CONDITION CODES ; (RESULTS OF COMPARE) ; LOAD CONDITION CODES INTO CPU FLAGS ; SAHF ; ; USE CONDITIONAL JUMPS TO DETERMINE ORDERING OF A TO B ; JP A_B_UNORDERED ; TEST C2 (PF) JB A_LESS ; TEST C0 (CF) JE A_EQUAL ; TEST C3 (ZF) A_GREATER: ; C0 (CF) = 0, C3 (ZF) = 0 . . A_EQUAL: ; C0 (CF) = 0, C3 (ZF) = 1 . . A_LESS: ; C0 (CF) = 1, C3 (ZF) = 0 . . A_B_UNORDERED: ; C2 (PF) = 1 . . Figure 7-2. Conditional Branching for FXAM ; JUMP TABLE FOR EXAMINE ROUTINE ; FXAM_TBL DD POS_UNNORM, POS NAN, NEG_UNNORM, NEG_NAN, & POS_NORM, POS_INFINITY, NEG_NORM, & NEG_INFINITY, POS_ZERO, EMPTY, NEG_ZERO, & EMPTY, POS_DENORM, EMPTY, NEG_DENORM, EMPTY . . ; EXAMINE ST AND STORE RESULT (CONDITION CODES) FXAM XOR EAX,EAX ; CLEAR EAX FSTSW AX ; CALCULATE OFFSET INTO JUMP TABLE AND AX,0100011100000000B ; CLEAR ALL BITS EXCEPT C3, C2-C0 SHR EAX,6 ; SHIFT C2-C0 INTO PLACE (0000XXX0) SAL AH,5 ; POSITION C3 (000X0000) OR AL,AH ; DROP C3 IN ADJACENT TO C2 (000XXXX0) XOR AH,AH ; CLEAR OUT THE OLD COPY OF C3 ; JUMP TO THE ROUTINE `ADDRESSED' BY CONDITION CODE JMP FXAM_TBL[EAX] ; HERE ARE THE JUMP TARGETS, ONE TO HANDLE ; EACH POSSIBLE RESULT OF FXAM POS_UNNORM: . POS_NAN: . NEG_UNNORM: . NEG_NAN: . POS_NORM: . POS_INFINITY: . NEG_NORM: . NEG_INFINITY: . POS_ZERO: . EMPTY: . NEG_ZERO: . POS_DENORM: . NEG_DENORM: 7.2 Exception Handling Examples There are many approaches to writing exception handlers. One useful technique is to consider the exception handler procedure as consisting of "prologue," "body," and "epilogue" sections of code. This procedure is invoked via interrupt number 16. At the beginning of the prologue, CPU interrupts have been disabled. The prologue performs all functions that must be protected from possible interruption by higher-priority sources. Typically, this involves saving CPU registers and transferring diagnostic information from the 80387 to memory. When the critical processing has been completed, the prologue may enable CPU interrupts to allow higher-priority interrupt handlers to preempt the exception handler. The body of the exception handler examines the diagnostic information and makes a response that is necessarily application-dependent. This response may range from halting execution, to displaying a message, to attempting to repair the problem and proceed with normal execution. The epilogue essentially reverses the actions of the prologue, restoring the CPU and the NPX so that normal execution can be resumed. The epilogue must not load an unmasked exception flag into the 80387 or another exception will be requested immediately. Figures 7-3 through 7-5 show the ASM386 coding of three skeleton exception handlers. They show how prologues and epilogues can be written for various situations, but provide comments indicating only where the application dependent exception handling body should be placed. Figures 7-3 and 7-4 are very similar; their only substantial difference is their choice of instructions to save and restore the 80387. The tradeoff here is between the increased diagnostic information provided by FNSAVE and the faster execution of FNSTENV. For applications that are sensitive to interrupt latency or that do not need to examine register contents, FNSTENV reduces the duration of the "critical region," during which the CPU does not recognize another interrupt request. After the exception handler body, the epilogues prepare the CPU and the NPX to resume execution from the point of interruption (i.e., the instruction following the one that generated the unmasked exception). Notice that the exception flags in the memory image that is loaded into the 80387 are cleared to zero prior to reloading (in fact, in these examples, the entire status word image is cleared). The examples in Figures 7-3 and 7-4 assume that the exception handler itself will not cause an unmasked exception. Where this is a possibility, the general approach shown in Figure 7-5 can be employed. The basic technique is to save the full 80387 state and then to load a new control word in the prologue. Note that considerable care should be taken when designing an exception handler of this type to prevent the handler from being reentered endlessly. Figure 7-3. Full-State Exception Handler SAVE_ALL PROC ; ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE ; FOR 80387 STATE IMAGE PUSH EBP MOV EBP,ESP SUB ESP,108 ; SAVE FULL 80387 STATE, ENABLE CPU INTERRUPTS FNSAVE [EBP-108] STI ; ; APPLICATION-DEPENDENT EXCEPTION HANDLING ; CODE GOES HERE ; ; CLEAR EXCEPTION FLAGS IN STATUS WORD ; (WHICH IS IN MEMORY) ; RESTORE MODIFIED STATE IMAGE MOV BYTE PTR [EBP-104], 0H FRSTOR [EBP-108] ; DEALLOCATE STACK SPACE, RESTORE CPU REGISTERS MOVE ESP,EBP . . POP EBP ; ; RETURN TO INTERRUPTED CALCULATION IRET SAVE_ALL ENDP Figure 7-4. Reduced-Latency Exception Handler SAVE_ENVIRONMENT PROC ; ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE ; FOR 80387 ENVIRONMENT PUSH EBP . MOV EBP,ESP SUB ESP,28 ; SAVE ENVIRONMENT, ENABLE CPU INTERRUPTS FNSTENV [EBP-28] STI ; ; APPLICATION EXCEPTION-HANDLING CODE GOES HERE ; ; CLEAR EXCEPTION FLAGS IN STATUS WORD ; (WHICH IS IN MEMORY) ; RESTORE MODIFIED ENVIRONMENT IMAGE MOV BYTE PTR [EBP-24], 0H FLDENV [EBP-28] ; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS MOV ESP,EBP POP EBP ; ; RETURN TO INTERRUPTED CALCULATION IRET SAVE_ENVIRONMENT ENDP Figure 7-5. Reentrant Exception Handler . . . LOCAL CONTROL DW ? ; ASSUME INITIALIZED . . . REENTRANT PROC ; ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR ; 80387 STATE IMAGE PUSH EBP . . . MOV EBP,ESP SUB ESP,108 ; SAVE STATE, LOAD NEW CONTROL WORD, ; ENABLE CPU INTERRUPTS FNSAVE [EBP-108] FLDCW LOCAL_CONTROL STI . . . ; APPLICATION EXCEPTION HANDLING CODE GOES HERE. ; AN UNMASKED EXCEPTION GENERATED HERE WILL ; CAUSE THE EXCEPTION HANDLER TO BE REENTERED. ; IF LOCAL STORAGE IS NEEDED, IT MUST BE ; ALLOCATED ON THE CPU STACK. . . . ; CLEAR EXCEPTION FLAGS IN STATUS WORD ; (WHICH IS IN MEMORY) ; RESTORE MODIFIED STATE IMAGE MOV BYTE PTR [EBP-104], 0H FRSTOR [EBP-108] ; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS MOV ESP,EBP . . . POP EBP ; RETURN TO POINT OF INTERRUPTION IRET REENTRANT ENDP 7.3 Flaoting-Point to ASCII Conversion Examples Numeric programs must typically format their results at some point for presentation and inspection by the program user. In many cases, numeric results are formatted as ASCII strings for printing or display. This example shows how floating-point values can be converted to decimal ASCII character strings. The function shown in Figure 7-6 can be invoked from PL/M-386, Pascal-386, FORTRAN-386, or ASM386 routines. Shortness, speed, and accuracy were chosen rather than providing the maximum number of significant digits possible. An attempt is made to keep integers in their own domain to avoid unnecessary conversion errors. Using the extended precision real number format, this routine achieves a worst case accuracy of three units in the 16th decimal position for a noninteger value or integers greater than 10^(18). This is double precision accuracy. With values having decimal exponents less than 100 in magnitude, the accuracy is one unit in the 17th decimal position. Higher precision can be achieved with greater care in programming, larger program size, and lower performance. Figure 7-6. Floating-Point to ASCII Conversion Routine XENIX286 80380 MACRO ASSEMBLER V1.0, ASSEMBLY OF MODULE FLOATING_TO_ASCII OBJECT MODULE PLACED IN fpasc.obj ASSEMBLER INVOKED BY: asm386 fpasc.asm LOC OBJ LINE SOURCE 1 +1 $title(`Convert a floating point number to ASCII') 2 3 name floating_to_ascii 4 00000000 5 public floating_to_ascii 6 extrn get_power_10:near,tos_status:near 7 ; 8 ; This subroutine will convert the floating point 9 ; number in the top of the NPX stack to an ASCII 10 ; string and separate power of 10 scaling value 11 ; (in binary). The maximum width of the ASCII string 12 ; formed is controlled by a parameter which must be 13 ; > 1. Unnormal values, denormal values, and psuedo 14 ; zeroes will be correctly converted. However, unnormals 15 ; and pseudo zeros are no longer supported formats on the 16 ; 80387( in conformance with the IEEE floating point 17 ; standard) and hence not generated internally. A 18 ; returned value will indicate how many binary bits 19 ; of precision were lost in an unnormal or denormal 20 ; value. The magnitude (in terms of binary power) 21 ; of a pseudo zero will also be indicated. Integers 22 ; less than 10**18 in magnitude are accurately converted 23 ; if the destination ASCII string field is wide enough 24 ; to hold all the digits. Otherwise the value is converted 25 ; to scientific notation. 26 ; 27 ; The status of the conversion is identified by the 28 ; return value, it can be: 29 ; 30 ; 0 conversion complete, string_size is defined 31 ; 1 invalid arguments 32 ; 2 exact integer conversion, string_size is defined 33 ; 3 indefinite 34 ; 4 + NAN (Not A Number) 35 ; 5 - NAN 36 ; 6 + Infinity 37 ; 7 - Infinity 38 ; 8 pseudo zero found, string_size is defined 39 ; 40 ; The PLM/386 calling convention is: 41 ; 42 ; floating_to_ascii: 43 ; procedure (number,denormal_ptr,string_ptr,size_ptr, 44 ; field_size, power_ptr) word external; 45 ; declare (denormal_ptr,string_ptr,power_ptr,size_ptr) 46 ; pointer; 47 ; declare field_size word, 48 ; string_size based size ptr word; 49 ; declare number real; 50 ; declare denormal integer based denormal ptr; 51 ; declare power integer based power_ptr; 52 ; end floating_to_ascii: 53 ; 54 ; The floating point value is expected to be 55 ; on the top of the NPX stack. This subroutine 56 ; expects 3 free entries on the NPX stack and 57 ; will pop the passed value off when done. The 58 ; generated ASCII string will have a leading 59 ; character either `-' or `+' indicating the sign 60 ; of the value. The ASCII decimal digits will 61 ; immediately follow. The numeric value of the 62 ; ASCII string is (ASCII STRING.)*10**POWER. If 63 ; the given number was zero, the ASCII string will 64 ; contain a sign and a single zero chacter. The 65 ; value string_size indicates the total length of 66 ; the ASCII string including the sign character. 67 ; String(0) will always hold the sign. It is 68 ; possible for string size to be less than 69 ; field_size. This occurs for zeroes or integer 70 ; values. A pseudo zero will return a special 71 ; return code. The denormal count will indicate 72 ; the power of two originally associated with the 73 ; value. The power of ten and ASCII string will 74 ; be as if the value was an ordinary zero. 75 ; 76 ; This subroutine is accurate up to a maximum of 77 ; 18 decimal digits for integers. Integer values 78 ; will have a decimal power of zero associated 79 ; with them. For non integers, the result will be 80 ; accurate to within 2 decimal digits of the 16th 81 ; decimal place(double precision). The exponentiate 82 ; instruction is also used for scaling the value into 83 ; the range acceptable for the BCD data type. The 84 ; roundirg mode in effect on entry to the 85 ; subroutine is used for the conversion. 86 ; 87 ; The following registers are not transparent: 88 ; 89 ; eax ebx ecx edx esi edi eflags 90 ; 91 ; 92 ; Define the stack layout. 93 ; 00000000[] 94 ebp_save equ dword ptr [ebp] 00000004[] 95 es_save equ ebp_save + size ebp_save 00000008[] 96 return_ptr equ es_save + size es_save 0000000C[] 97 power_ptr equ return_ptr + size return_ptr 00000010[] 98 field_size equ power_ptr + size power_ptr 00000014[] 99 size_ptr equ field_size + size field_size 00000018[] 100 string_ptr equ size_ptr + size size_ptr 0000001C[] 101 denormal_ptr equ string_ptr + size string_ptr 102 0014 103 parms_size equ size power_ptr + size field_size + 104 & size size_ptr + size string_ptr + 105 & size denormal_ptr 106 ; 107 ; Define constants used 108 ; 109 BCD_DIGITS equ 18 ; Number of digits in bcd_value 110 WORD_SIZE equ 4 111 BCD_SIZE equ 10 112 MINUS equ 1 ; Define return values 113 NAN equ 4 ; The exact values chosen 114 INFINITY equ 6 ; here are important. They must 115 INDEFINITE equ 3 ; correspond to the possible return 116 PSEUDO_ZERO equ 8 ; values and be in the same numeric 117 INVALID equ -2 ; order as tested by the program. 118 ZERO equ -4 119 DENORMAL equ -6 120 UNNORMAL equ -8 121 NORMAL equ 0 122 EXACT equ 2 123 ; 124 ; Define layout of temporary storage area. 125 ; 126 power_two equ word ptr [ebp - WORD_SIZE] 127 bcd_value equ tbyte ptr power two - BCD_SIZE 128 bcd_byte equ byte ptr bcd_value 129 fraction equ bcd_value 130 131 local_size equ size power_two + size bcd_value 132 ; 133 ; Allocate stack space for the temporaries so 134 ; the stack will be big enough 135 ; 136 stack stackseg (local_size+6) ; Allocate stack 137 ; space for locals 138 +1 $eject 139 code segment public er 140 extrn power_table:qword 141 ; 142 ; Constants used by this function. 143 ; 144 even ; Optimize for 16 bits 00000000 0A00 145 const10 dw 10 ; Adjustment value for 140 ; ; too big BCD 147 ; 148 ; Convert the C3,C2,C1,C0 encoding from tos_status 149 ; into meaningful bit flags and values. 150 ; 00000002 F8 151 status_table db UNNORMAL, NAN, UNNORMAL + MINUS, 00000003 04 152 & NAN + MINUS, NORMAL, INFINITY, 00000004 F9 153 & NORMAL + MINUS, INFINITY + MINUS, 00000005 05 154 & ZERO, INVALID, ZERO + MINUS, INVALID, 00000006 00 155 & DENORMAL, INVALID, DENORMAL + MINUS, INVALID 00000007 06 00000008 01 00000009 07 0000000A FC 0000000B FE 0000000C FD 0000000D FE 0000000E FA 0000000F FE 00000010 FB 00000011 FE 156 00000012 157 floating_to_ascii proc 158 00000012 E800000000 E 159 call tos_status ; Look at status of ST(0) 160 161 ; Get descriptor from table 00000017 2E0FB68002000000 R 162 movzx eax, status_table[eax] 0000001F 3CFE 163 cmp al,INVALID ; Look for empty ST(0) 00000021 7527 164 jne not_empty 165 ; 166 ; ST(0) is empty! Return the status value. 167 ; 00000023 C21400 168 ret parms_size 169 ; 170 ; Remove infinity from stack and exit. 171 ; 00000026 172 found_infinity: 00000026 DDD8 173 fstp st(0) ; OK to leave fstp running 00000028 EB02 174 jmp short exit_proc 175 ; 176 ; String space is too small! 177 ; Return invalid code. 178 ; 0000002A 179 small_string: 0000002A B0FE 180 mov al,INVALID 0000002C 181 exit_proc: 0000002C C9 182 leave ; Restore stack setup 0000002D 07 183 pop es 0000002E C21400 184 ret parms_size 185 ; 186 ; ST(0) is NAN or indefinite. Store the 187 ; value in memory and look at the fraction 188 ; field to separate indefinite from an ordinary NAN. 189 ; 00000031 190 NAN_or_indefinite: 00000031 DB7DF2 191 fstp fraction ; Remove value from stack 192 ; for examination 00000034 A801 193 test al,MINUS ; Look at sign bit 00000036 9B 194 fwait ; Insure store is done 00000037 74F3 195 jz exit_proc ; Can't be indefinite if 196 ; positive 197 00000039 BB000000C0 198 mov ebx,0C0000000H ; Match against upper 32 199 ;bits of fraction 200 201 ; Compare bits 63-32 0000003E 2B5DF6 202 sub ebx, dword ptr fraction + 4 203 204 ; Bits 31-0 must be zero 00000041 0B5DF2 205 or ebx, dword ptr fraction 00000044 75E6 206 jnz exit_proc 207 208 ; Set return value for indefinite value 00000046 B003 209 mov al,INDEFINITE 00000048 EBE2 210 jmp exit_proc 211 ; 212 ; Allocate stack space for local variables 213 ; and establish parameter addressibility. 214 ; 0000004A 215 not_empty: 0000004A 06 216 push es ; Save working register 0000004B C80C0000 217 enter local_size, 0 ; Setup stack addressing 218 219 220 ; Check for enough string space 0000004F 8B4D10 221 mov ecx,field size 00000052 83F902 222 cmp ecx,2 00000055 7CD3 223 jl small_string 224 00000057 49 225 dec ecx ; Adjust for sign character 226 227 ; See if string is too large for BCD 00000058 83F912 228 cmp ecx,BCD_DIGITS 0000005B 7605 229 jbe size_ok 230 231 ; Else set maximum string size 0000005D B912000000 232 mov ecx,BCD_DIGITS 00000002 233 size_ok: 00000062 3C06 234 cmp al,INFINITY ; Look for infinity 235 236 ; Return status value for + or - inf 00000064 7DC0 237 jge found_infinity 238 00000066 3C04 239 cmp al,NAN ; Look for NAN or INDEFINITE 00000068 7DC7 240 jge NAN_or_indefinite 241 ; 242 ; Set default return values and check that 243 ; the number is normalized. 244 ; 0000006A D9E1 245 fabs ; Use positive value only 246 ; sign bit in al has true sign of value 0000006C 31D2 247 xor edx,edx ; Form 0 constant 0000006E 8B7D1C 248 mov edi,denormal_ptr; Zero denormal count 00000071 668917 249 mov [edi], dx 00000074 8B5D0C 250 mov ebx,power_ptr ; Zero power of ten value 00000077 668913 251 mov [ebx], dx 0000007A 88C2 252 mov dl, al 0000007C 80E201 253 and dl, 1 0000007F 80C202 254 add dl, EXACT 00000082 3CFC 255 cmp al,ZERO ; Test for zero 00000084 0F83BC000000 256 jae convert_integer ; Ship power code if value 257 ; is zero 0000008A DB7DF2 258 fstp fraction 00000080 9B 259 fwait 0000008E 8A45F9 260 mov al, bcd_byte + 7 00000091 804DF980 261 or byte ptr bcd_byte + 7, 80h 00000095 DB6DF2 262 fld fraction 00000098 D9F4 263 fxtract 0000009A A880 264 test al, 80h 0000009C 7524 265 jnz normal_value 266 0000009E D9E8 267 fld1 000000A0 DEE9 268 fsub 000000A2 D9E4 269 ftat 000000A4 9BDFE0 270 fatsw ax 000000A7 9E 271 sahf 000000A8 7510 272 jnz set_unnormal_count 273 ; 274 ; Found a pseudo zero 275 ; 000000AA D9EC 276 fldlg2 ; Develop power of ten estimate 000000AC 80C206 277 add dl, PSEUDO ZERO - EXACT 000000AF DECA 278 fmulp st(2), st 000000B1 D9C9 279 fxch ; Get power of ten 000000B3 DF1B 280 fistp word ptr [ebx] ; Set power of ten 000000B5 E98C000000 281 jmp convert_integer 282 000000BA 283 set_unnonmal_count: 000000BA D9F4 284 fxtract ; Get original fraction, 285 ; now normalized 000000BC D9C9 286 fxch ; Get unnormal count 000000BE D9E0 287 fchs 000000C0 DF1F 288 fistp word ptr [edi] ; Set unnormal count 289 290 291 ; Calculate the decimal magnitude associated 292 ; with this number to within one order. This 293 ; error will always be inevitable due to 294 ; rounding and lost precision. As a result, 295 ; we will deliberately fail to consider the 296 ; LOG10 of the fraction value in calculating 297 ; the order. Since the fraction will always 298 ; be 1 <= F < 2, its LOG10 will not change 299 ; the basic accuracy of the function. To 300 ; get the decimal order of magnitude, simply 301 ; multiply the power of two by LOG10(2) and 302 ; truncate the result to an integer. 303 ; 304 normal_value: 305 fstp fraction ; Save the fraction field 306 ; for later use 307 fist power_two ; Save power of two 308 fldlg2 ; Get LOG10(2) 309 ; Power_two is now safe to use 310 fmul ; Form LOG10(of exponent of number) 311 fistp word ptr [ebx] ; Any rounding mode 312 ; will work here 313 ; 314 ; Check if the magnitude of the number rules 315 ; out treating it as an integer. 316 ; 317 ; CX has the maximum number of decimal digits 318 ; allowed. 319 ; 320 fwait ; Wait for power_ten to be valid 321 322 ; Get power of ten of value 323 movsx si, word ptr [ebx] 324 sub esi,ecx ; Form scaling factor 325 ; necessary in ax 326 ja adjust result ; Jump if number will not fit 327 ; 328 ; The number is between 1 and 10**(field size). 329 ; Test if it is an integer. 330 ; 331 fild power_two ; Restore original number 332 sub dl,NORMAL-EXACT ; Convert to exact return 333 ; value 334 fld fraction 335 fscale ; Form full value, this 336 ; is safe here 337 fst st(1) ; Copy value for compare 338 frndint ; Test if its an integer 339 fcomp ; Compare values 340 fstsw ax ; Save status 341 sahf ; C3=1 implies it was 342 ; an integer 343 jnz convert_integer 344 345 fstp st(0) ; Remove non integer value 346 add dl,NORMAL-EXACT ; Restore original return value 347 ; 348 ; Scale the number to within the range allowed 349 ; by the BCD format.The scaling operation should 350 ; produce a number within one decimal order of 351 ; magnitude of the largest decimal number 352 ; representable within the given string width. 353 ; 354 ; The scaling power of ten value is in si. 355 ; 000000F2 356 adjust_result: 000000F2 8BC6 357 mov eax,esi ; Setup for pow10 000000F4 668903 358 mov word ptr [ebx],ax ; Set initial power 359 ; of ten return value 000000F7 F7D8 360 neg eax ; Subtract one for each order of 361 ; magnitude the value is scaled by 000000F9 E800000000 E 362 call get_power_10 ; Scaling factor is 363 ; returned as 364 ; exponent and fraction 000000FE DB6DF2 365 fld fraction ; Get fraction 00000101 DEC9 366 fmul ; Combine fractions 00000103 8BF1 367 mov esi,ecx ; Form power of ten of 368 ; the maximum 00000105 C1E603 369 shl esi,3 ; BCD value to fit in 370 ; the strinq 00000108 DF45FC 371 fild power_two ; Combine powers of two 0000010B DEC2 372 faddp st(2),st 0000010D D9FD 373 fscale ; Form full value, 374 ; exponent was safe 0000010F DDD9 375 fstp st(1) ; Remove exponent 376 ; 377 ; Test the adjusted value against a table 378 ; of exact powers of ten. The combined errors 379 ; of the magnitude estimate and power function 380 ; can result in a value one order of magnitude 381 ; too small or too large to fit correctly in 382 ; the BCD field. To handle this problem, pretest 383 ; the adjusted value, if it is too small or 384 ; large, then adjust it by ten and adjust the 385 ; power of ten value. 386 ; 00000111 387 test_power: 388 389 ; Compare against exact power entry. Use the next 390 ; entry since cx has been decremented by one 00000111 2EDC9608000000 E 391 fcom power_table[esi]+type power_table 00000118 9BDFE0 392 fstsw ax ; No wait is necessary 0000011B 9E 393 sahf ; If C3 = C0 = 0 then 0000011C 720F 394 jb test_for_small ; too big 395 0000011E 2EDE3500000000 R 396 fidiv const10 ; Else adjust value 00000125 80E2FD 397 and dl,not EXACT ; Remove exact flag 00000128 66FF03 398 inc word ptr [ebx] ; Adjust power of ten value 0000012B EB17 399 jmp short in range ; Convert the value to a BCD 400 ; integer 0000012D 401 test for small: 0000012D 2EDC9600000000 E 402 fcom power table[esi] ; Test relative size 0000134 9BDFE0 403 fstsw ax ; No wait is necessary 0000137 9E 404 sahf ; If CO = 0 then 405 ; st(O) >= lower bound 10000138 720A 406 jc in_range ; Convert the value to a 407 ; BCD integer 408 000013A 2EDE0D00000000 R 409 fimul const10 ; Adjust value into range 0000141 66FF0B 410 dec word ptr [ebx] ; Adjust power of ten value 0000144 411 in_range: 0000144 D9FC 412 frndint ; Form integer value 413 ; 414 ; Assert: 0 <= TOS <= 999,999,999,999,999,999 415 ; The TOS number will be exactly representable 416 ; in 18 digit BCD format. 417 ; 00000146 418 convert_integer: 00000146 DF75F2 419 fbstp bcd_value ; Store as BCD format number 420 ; 421 ; while the store BCD runs, setup registers 422 ; for the conversion to ASCII. 423 ; 00000149 BE08000000 424 mov esi,BCD_SIZE.2 ; Initial BCD index value 0000014E 66B9040F 425 mov cx,0f04h ; Set shift count and mask 00000152 BB01000000 426 mov ebx,1 ; Set initial size of ASCII 427 ; field for sign 00000157 8B7D18 428 mov edi,string_ptr ; Get address of start of 429 ; ASCII string 0000015A 8CD8 430 mov ax,ds ; Copy ds to es 0000015C 8EC0 431 mov es,ax 0000015E FC 432 cld ; Set autoincrement mode 0000015F B02B 433 mov al,'+' ; Clear sign field 00000161 F6C201 434 test dl,MINUS ; Look for negative value 00000164 7402 435 jz positive_result 436 00000166 B02D 437 mov al,`.' 00000168 438 positive_result: 00000168 AA 439 stosb ; Bump string pointer 440 ; past sign 00000169 80E2FE 441 and dl,not MINUS ; Turn off sign bit 0000016C 9B 442 fwait ; Hait for fbstp to finish 443 ; 444 ; Register usage: 445 ; ah: BCD byte value in use 446 ; al: ASCII character value 447 ; dx: Return value 448 ; ch: BCD mask = 0fh 449 ; cl: BCD shift count = 4 450 ; bx: ASCII string field width 451 ; esi: BCD field index 452 ; di: ASCII string field pointer 453 ; ds,es: ASCII string segment base 454 ; 455 ; Remove leading zeroes from the number. 456 ; 0000016D 457 skip_leading_zeroes: 0000016D 8A6435F2 458 mov ah,bcd_byte[esi] ; Get BCD byte 00000171 88E0 459 mov al,ah ; Copy value 00000173 D2E8 460 shr al,cl ; Get high order digit 00000175 240F 461 and al,0fh ; Set zero flag 00000177 7517 462 jnz enter_odd ; Exit loop if leading 463 ; non zero found 464 00000179 88E0 465 mov al,ah ; Get BCD byte again 0000017B 240F 466 and al,0fh ; Get low order digit 0000017D 7519 467 jnz enter_even ; Exit loop if non zero 468 ; digit found 469 0000017F 4E 470 dec esi ; Decrement BCD index 00000180 79EB 471 jns ship_leading_zeroes 472 ; 473 ; The significand was all zeroes. 474 ; 00000182 B030 475 mov al,`O' ; Set initial zero 00000184 AA 476 stosb 00000185 43 477 inc ebx ; Bump string length 00000186 EB17 478 jmo short exit_with_value 479 ; 480 ; Now expand the BCD string into digit 481 ; per byte values 0-9. 482 ; 00000188 483 digit_loop: 00000188 8A6435F2 484 mov ah,bcd_byte[esi] ; Get BCD byte 0000018C 88E0 485 mov al,ah 0000018E D2E8 486 shr al,cl ; Get high order digit 00000190 487 enter_odd: 00000190 0430 488 add al,`O' ; Convert to ASCII 00000192 AA 489 stosb ; Put digit into ASCII 490 ; string area 00000193 88E0 491 mov al,ah ; Get low order digit 00000195 240F 492 and al,0fh 00000197 43 493 inc ebx ; Bump field size counter 00000198 494 enter_even: 00000198 0430 495 add al,`0' ; Convert to ASCII 0000019A AA 496 stosb ; Put digit into ASCII area 0000019B 43 497 inc ebx ; Bump field size counter 0000019C 4E 498 dec esi ; Go to next BCD byte 0000019D 79E9 499 jns digit_loop 500 ; 501 ; Conversion complete. Set the string 502 ; size and remainder. 503 ; 0000019F 504 exit_with_value: 0000019F 8B7D14 505 mov edi,size_ptr 000001A2 66891F 506 mov word ptr [edi],bx 000001A5 8BC2 507 mov eax,edx ; Set return value 000001A7 E980FEFFFF 508 jmp exit_proc 509 000001AC 510 floating_to_ascii endp 511 -------- 512 code ends 513 end ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS. XENIX286 80386 MACRO ASSEMBLER V1.0, ASSEMBLY OF MODULE_GET_POWER 10 OBJECT MODULE PLACED IN power10.obj ASSEMBLER INVOKED BY: asm386 power10.asm LOC OBJ LINE SOURCE 1 +1 $title(Calculate the value of 10**ax) 2 ; 3 ; This subroutine will calculate the 4 ; value of 10**eax. For values of 5 ; 0 <= eax < 19, the result will exact. 6 ; All 80386 registers are transparent 7 ; and the value is returned on the TOS 8 ; as two numbers, exponent in ST(1) and 9 ; fraction in ST(0). The exponent value 10 ; can be larger than the largest 11 ; exponent of an extended real format 12 ; number. Three stack entries are used. 13 ; 14 name get_power_10 00000000 15 public get_power_10,power_table 16 -------- 17 stack stackseg 8 18 -------- 19 code segment public er 20 ; 21 ; Use exact values from 1.0 to 1e18. 22 ; 23 even ; Optimize 16 bit access 00000000 000000000000F03F 24 power_table dq 1.0,1e1,1e2,1e3 00000008 00000000000D2440 00000010 0000000000005940 00000018 0000000000408F40 00000020 000000000088C340 25 dq 1e4,1e5,1e6,1e7 00000028 00000000006AF840 00000030 0000000080842E41 00000038 00000000D0126341 00000040 0000000084D79741 26 dq 1e8,1e9,1e10,1e11 00000048 0000000065CDCD41 00000050 000000205FA00242 00000058 000000E876483742 00000060 000000A2941A6D42 27 dq 1e12,1e13,1e14,1e15 00000068 000040E59C30A242 00000070 0000901EC4BCD642 00000078 00003420F56B0C43 00000080 0080E03779C34143 28 dq 1e16,1e17,1e18 00000088 00A0D88557347643 00000090 00C84E676DC1ABC3 29 00000098 30 get_power_10 proc 31 00000098 3D12000000 32 cmp eax,18 ; Test for 0 <= ax < 19 0000009D 770B 33 ja out_of_range 34 0000009F 2EDD04C500000000 R 35 fld power_table[eax*8]; Get exact value 000000A7 D9F4 36 fxtract ; Separate power 7.3.1 Function Partitioning Three separate modules implement the conversion. Most of the work of the conversion is done in the module FLOATING_TO_ASCII. The other modules are provided separately, because they have a more general use. One of them, GET_POWER_10, is also used by the ASCII to floating-point conversion routine. The other small module, TOS_STATUS, identifies what, if anything, is in the top of the numeric register stack. 7.3.2 Exception Considerations Care is taken inside the function to avoid generating exceptions. Any possible numeric value is accepted. The only possible exception is insufficient space on the numeric register stack. The value passed in the numeric stack is checked for existence, type (NaN or infinity), and status (denormal, zero, sign). The string size is tested for a minimum and maximum value. If the top of the register stack is empty, or the string size is too small, the function returns with an error code. Overflow and underflow is avoided inside the function for very large or very small numbers. 7.3.3 Special Instructions The functions demonstrate the operation of several numeric instructions, different data types, and precision control. Shown are instructions for automatic conversion to BCD, calculating the value of 10 raised to an integer value, establishing and maintaining concurrency, data synchronization, and use of directed rounding on the NPX. Without the extended precision data type and built-in exponential function, the double precision accuracy of this function could not be attained with the size and speed of the shown example. The function relies on the numeric BCD data type for conversion from binary floating-point to decimal. It is not difficult to unpack the BCD digits into separate ASCII decimal digits. The major work involves scaling the floating-point value to the comparatively limited range of BCD values. To print a 9-digit result requires accurately scaling the given value to an integer between 10^(8) and 10^(9). For example, the number +0.123456789 requires a scaling factor of 10^(9) to produce the value +123456789.0, which can be stored in 9 BCD digits. The scale factor must be an exact power of 10 to avoid changing any of the printed digit values. These routines should exactly convert all values exactly representable in decimal in the field size given. Integer values that fit in the given string size are not be scaled, but directly stored into the BCD form. Noninteger values exactly representable in decimal within the string size limits are also exactly converted. For example, 0.125 is exactly representable in binary or decimal. To convert this floating-point value to decimal, the scaling factor is 1000, resulting in 125. When scaling a value, the function must keep track of where the decimal point lies in the final decimal value. 7.3.4 Description of Operation Converting a floating-point number to decimal ASCII takes three major steps: identifying the magnitude of the number, scaling it for the BCD data type, and converting the BCD data type to a decimal ASCII string. Identifying the magnitude of the result requires finding the value X such that the number is represented by I * 10^(X), where 1.0 ¾ I < 10.0. Scaling the number requires multiplying it by a scaling factor 10^(S), so that the result is an integer requiring no more decimal digits than provided for in the ASCII string. Once scaled, the numeric rounding modes and BCD conversion put the number in a form easy to convert to decimal ASCII by host software. Implementing each of these three steps requires attention to detail. To begin with, not all floating-point values have a numeric meaning. Values such as infinity, indefinite, or NaN may be encountered by the conversion routine. The conversion routine should recognize these values and identify them uniquely. Special cases of numeric values also exist. Denormals have numeric values, but should be recognized because they indicate that precision was lost during some earlier calculations. Once it has been determined that the number has a numeric value, and it is normalized (setting appropriate denormal flags, if necessary, to indicate this to the calling program), the value must be scaled to the BCD range. 7.3.5 Scaling the Value To scale the number, its magnitude must be determined. It is sufficient to calculate the magnitude to an accuracy of 1 unit, or within a factor of 10 of the required value. After scaling the number, a check is made to see if the result falls in the range expected. If not, the result can be adjusted one decimal order of magnitude up or down. The adjustment test after the scaling is necessary due to inevitable inaccuracies in the scaling value. Because the magnitude estimate for the scale factor need only be close, a fast technique is used. The magnitude is estimated by multiplying the power of 2, the unbiased floating-point exponent, associated with the number by log{10}2. Rounding the result to an integer produces an estimate of sufficient accuracy. Ignoring the fraction value can introduce a maximum error of 0.32 in the result. Using the magnitude of the value and size of the number string, the scaling factor can be calculated. Calculating the scaling factor is the most inaccurate operation of the conversion process. The relation 10^(X) = 2^(X * log{2}10) is used for this function. The exponentiate instruction F2XM1 is used. Due to restrictions on the range of values allowed by the F2XM1 instruction, the power of 2 value is split into integer and fraction components. The relation 2^(I + F) = 2^(I) * 2^(F) allows using the FSCALE instruction to recombine the 2^(F) value, calculated through F2XM1, and the 2^(I) part. 7.3.5.1 Inaccuracy in Scaling The inaccuracy in calculating the scale factor arises because of the trailing zeros placed into the fraction value of the power of two when stripping off the integer valued bits. For each integer valued bit in the power of 2 value separated from the fraction bits, one bit of precision is lost in the fraction field due to the zero fill occurring in the least significant bits. Up to 14 bits may be lost in the fraction because the largest allowed floating point exponent value is 2^(14) - 1. These bits directly reduce the accuracy of the calculated scale factor, thereby reducing the accuracy of the scaled value. For numbers in the range of 10^(±30), a maximum of 8 bits of precision are lost in the scaling process. 7.3.5.2 Avoiding Underflow and Overflow The fraction and exponent fields of the number are separated to avoid underflow and overflow in calculating the scaling values. For example, to scale 10^(-4932) to 10^(8) requires a scaling factor of 10^(4950), which cannot be represented by the NPX. By separating the exponent and fraction, the scaling operation involves adding the exponents separate from multiplying the fractions. The exponent arithmetic involves small integers, all easily represented by the NPX. 7.3.5.3 Final Adjustments It is possible that the power function (Get_Power_10) could produce a scaling value such that it forms a scaled result larger than the ASCII field could allow. For example, scaling 9.9999999999999999 * 10^(4900) by 1.00000000000000010 * 10^(-4883) produces 1.00000000000000009 * 10^(18). The scale factor is within the accuracy of the NPX and the result is within the conversion accuracy, but it cannot be represented in BCD format. This is why there is a post-scaling test on the magnitude of the result. The result can be multiplied or divided by 10, depending on whether the result was too small or too large, respectively. 7.3.6 Output Format For maximum flexibility in output formats, the position of the decimal point is indicated by a binary integer called the power value. If the power value is zero, then the decimal point is assumed to be at the right of the rightmost digit. Power values greater than zero indicate how many trailing zeros are not shown. For each unit below zero, move the decimal point to the left in the string. The last step of the conversion is storing the result in BCD and indicating where the decimal point lies. The BCD string is then unpacked into ASCII decimal characters. The ASCII sign is set corresponding to the sign of the original value. 7.4 Trigonometric Calculation Examples (Not Tested) In this example, the kinematics of a robot arm is modeled with the 4 * 4 homogeneous transformation matrices proposed by Denavit and Hartenberg J. Denavit and R.S. Hartenberg, "A Kinematic Notation for Lower-Pair Mechanisms Based on Matrices," J. Applied Mechanics, June 1955, pp. 215-221. C.S. George Lee, "Robot Arm Kinematics, Dynamics, and Control," IEEE Computer, Dec. 1982.. The translational and rotational relationships between adjacent links are described with these matrices using the D-H matrix method. For each link, there is a 4 * 4 homogeneous transformation matrix that represents the link's coordinate system (L{i}) at the joint (J{i}) with respect to the previous link's coordinate system (J{i-1}, L{i-1}). The following four geometric quantities completely describe the motion of any rigid joint/link pair (J{i}, L{i}), as Figure 7-7 See page 7-22 in the printed version of this manual. illustrates. Ú{i} = The angular displacement of the x{i} axis from the x{i-1} axis by rotating around the z{i-1} axis (anticlockwise). d{i} = The distance from the origin of the (i-1)^(th) coordinate system along the z{i-1} axis to the x{i} axis. a{i} = The distance of the origin of the i^(th) coordinate system from the z{i-1} axis along the -x{i} axis. Ó{i} = The angular displacement of the z{i} axis from the z{i-1} about the x{i} axis (anticlockwise). The D-H transformation matrix A=^(i){i-1} for adjacent coordinate frames (from joint{i-1} to joint{i}) is calculated as follows: A^(i){i-1} = T{z,d} * T{z,Ú} * T{x,a} * T{x,Ó} ...where... T{z,d} represents a translation along the z=i-1 axis T{z,Ú} represents a rotation of angle Ú about the z=i-1 axis T{x,a} represents a translation along the x{i}axis T{x,Ó} represents a rotation of angle Ó about the x{i}axis COS Ú{i} -COS Ó{i}SIN Ú{i} SIN Ó{i}SIN Ú{i} COS Ú{i} A^(i){i-1} = SIN Ú{i} COS Ó{i}COS Ú{i} -SIN Ó{i}COS Ú{i} SIN Ú{i} 0 SIN Ó{i} COS Ó{i} d{i} 0 0 0 1 The composite homogeneous matrix T which represents the position and orientation of the joint/link pair with respect to the base system is obtained by successively multiplying the D-H transformation matrices for adjacent coordinate frames. T^(i){0} = A^(1){0} * A^(2){1} * ... * A^(i){i-1} This example in Figure 7-8 illustrates how the transformation process can be accomplished using the 80387. The program consists of two major procedures. The first procedure TRANS_PROC is used to calculate the elements in each D-H matrix, A^(i){i-1}. The second procedure MATRIXMUL_PROC finds the product of two successive D-H matrices. Figure 7-8. Robot Arm Kinematics Example XENIX286 80386 MACRO ASSEMBLER V1.0, ASSEMBLY OF MODULE TOS_STATUS OBJECT MODULE PLACED IN tos.obj ASSEMBLER INVOKED BY: asm386 tos.asm LOC OBJ LINE SOURCE 1 +1 $title(Determine TOS register contents) 2 ; 3 ; This subroutine will return a value 4 ; from 0-15 in eax corresponding 5 ; to the contents of NPX TOS. All 6 ; registers are transparent and no 7 ; errors are possible. The return 8 ; value corresponds to c3,c2,c1,c0 9 ; of FXAM instruction. 10 ; 11 name tos_status 00000000 12 public tos_status 13 -------- 14 stack stackseg 6 15 -------- 16 code segment public er 17 00000000 18 tos_status proc 19 00000000 D9E5 20 fxam ; Get status of TOS register 00000002 9BDFE0 21 fstsw ax ; Get current status 00000D05 88E0 22 mov al,ah ; Put bit 10.8 into bits 2-0 00000007 2507400000 23 and eax,4007h ; Mask out bits c3,c2,c1,c0 0000000C C0EC03 24 shr ah, 3 ; Put bit c3 into bit 11 0000000F 08E0 25 or al,ah ; Put c3 into bit 3 00000011 B400 26 mov ah,0 ; Clear return value 00000013 C3 27 ret 28 00000014 29 tos_status endp 30 -------- 31 code ends 32 end ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS. LOC OBJ LINE SOURCE 37 ; and fraction 000000A9 C3 38 rat ; OK to leave fxtract running 39 ; 40 ; Calculate the value using the 41 ; exponentiate instruction. The following 42 ; relations are used: 43 ; 10**x = 2**(log2(10)*x) 44 ; 2**(I+F) = 2**I * 2**F 45 ; if st(1) = I and st(0) = 2**F then 46 ; fscale produces 2**(I+F) 47 ; 000000AA 48 out of range: 49 000000AA D9E9 50 fld12t ; TOS = LOG2(10) 000000AC C8040000 51 enter 4,0 52 53 ; save power of 10 value, P 000000B0 8945FC 54 mov [ebp-4],eax 55 56 ; T0S,X = LOG2(10)*P = LOG2(10**P) 000000B3 DA4DFC 57 fimul dword ptr [ebp-4] 000000B6 D9E8 58 fld1 ; Set TOS = -1.0 000000B8 D9E0 59 fchs 000000BA D9C1 60 fld st(1) ; Copy power value 61 ; in base two 000000BC D9FC 62 frndint ; TOS = I: -inf < I <= X 63 ; where I is an integer 64 ; Rounding mode does 65 ; not matter 0000003E D9CA 66 fxch st(2) ; TOS = X, ST(1) = -1.0 67 ; ST(2) = I 000000C0 D8E2 68 fsub st,st(2) ; T0S,F = X-I: 69 ; -1.0 < TOS <= 1.0 70 71 ; Restore orignal rounding control 000000C2 58 72 pop eax 000000C3 D9F0 73 f2xm1 ; TOS = 2**(F) - 1.0 000000C5 C9 74 leave ; Restore stack 000000C6 DEE1 75 fsubr ; Form 2**(F) 000000C8 C3 76 rat ; OK to leave fsubr running 77 000000C9 78 get_power_10 endp 79 -------- 80 code ends 81 end ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS. XENIX286 80386 MACRO ASSEMBLER V1.0, ASSEMBLY OF MODULE ROT_MATRIX_CAL OBJECT MODULE PLACED IN transx.obj ASSEMBLER INVOKED BY: asm386 transx.asm LOC OBJ LINE SOURCE 1 Name ROT_MATRIX_CAL 2 3 4 5 ; This example illustrates the use 6 ; of the 80387 floating point 7 ; instructions, in particular, the 8 ; FSINCOS function which gives both 9 ; the SIN and COS values. 10 ; The program calculates the 11 ; composite matrix for base to 12 ; end-effector transformation. 13 ; 14 ; Only the kinematics is considered in 15 ; this example. 16 ; 17 ; If the composite matrix mentioned above 18 ; is given by: 19 ; T1n = A1 x A2 x ... x An 20 ; T1n is found by successively calling 21 ; trans_proc and matrixmul_pro until 22 ; all matrices have been exhausted. 23 ; 24 ; trans_proc calculates entries in each 25 ; A(A1,...,An) while matrixmul_proc 26 ; performs the matrix multiplication for 27 ; Ai and Ai+1. matrixmul_proc in turn 28 ; calls matrix_row and matrix_elem to 29 ; do the multiplication. 30 31 32 ; Define stack space 33 -------- 34 trans_stack stackseg 400 35 36 ; Define the matrix structure for 37 ; 4X4 transformational matrices 38 -------- 39 a_matrix struc 00000000 40 a11 dq ? 00000008 41 a12 dq ? 00000010 42 a13 dq ? 00000018 43 a14 dq ? 00000020 44 a21 dq ? 00000028 45 a22 dq ? 00000030 46 a23 dq ? 00000038 47 a24 dq ? 00000040 48 a31 dq 0h 00000048 49 a32 dq ? 00000050 50 a33 dq ? 00000058 51 a34 dq ? 00000060 52 a41 dq 0h 00000068 53 a42 dq 0h 00000070 54 a43 dq 0h 00000078 55 a44 dq 1h -------- 56 a_matrix ends 57 58 ; Assume One joint in the storage 59 ; allocation and hence for 60 ; two sets of parameters; however, 61 ; more joints are possible 62 ; 63 alp_deg struc 00000000 64 alpha_deg1 dd ? 00000004 65 alpha_deg2 dd -------- 66 alp_deg ends 67 -------- 68 tht_deg struc 00000000 69 theta_deg1 dd ? 00000004 70 theta_deg2 dd -------- 71 tht_deg ends 72 -------- 73 A_array struc 00000000 74 A1 dq ? 00000008 75 A2 dq ? -------- 76 A_array ends 77 -------- 78 D_array struc 00000000 79 D1 dq ? 00000008 80 D2 dq ? -------- 81 D_array ends 82 83 ; trans_data is the data segment 84 ; 85 ------- 86 trans_data segment rw public 87 88 Amx a_matrix<> 00000000 ???????????????? 00000008 ???????????????? 00000010 ???????????????? 00000018 ???????????????? 00000020 ???????????????? 00000028 ???????????????? 00000030 ???????????????? 00000038 ???????????????? 00000040 0000000000000000 00000048 ???????????????? 00000050 ???????????????? 00000058 ???????????????? 00000060 0000000000000000 00000068 0000000000000000 00000070 0000000000000000 00000078 0100000000000000 00000080 ???????????????? 89 Bmx a_matrix<> 00000088 ???????????????? 00000090 ???????????????? 00000098 ???????????????? 000000A0 ???????????????? 000000A8 ???????????????? 000000B0 ???????????????? 000000B8 ???????????????? 000000C0 0000000000000000 000000C8 ???????????????? 000000D0 ???????????????? 000000D8 ???????????????? 000000E0 0000000000000000 000000E8 0000000000000000 000000F0 0000000000000000 000000F8 0100000000000000 00000100 ???????????????? 90 Tmx a matrix<> 00000108 ???????????????? 00000110 ???????????????? 00000118 ???????????????? 00000120 ???????????????? 00000128 ???????????????? 00000130 ???????????????? 00000138 ???????????????? 00000140 0000000000000000 00000148 ???????????????? 00000150 ???????????????? 00000158 ???????????????? 00000160 0000000000000000 00000168 0000000000000000 00000170 0000000000000000 00000178 0100000000000000 00000180 ???????? 91 ALPHA_DEG alp_deg<> 00000184 ???????? 00000188 ???????? 92 THETA_DEG tht_deg<> 0000018C ???????? 00000190 ???????????????? 93 A_VECT0R A_array<> 00000198 ???????????????? 000001A0 ???????????????? 94 D_VECT0R D_array<> 000001A8 ???????????????? 000001B0 00000000 95 ZER0 dd 0 000001B4 B4000000 96 d180 dd 180 0001 97 NUM_JOIMT equ 1 0004 98 NUM_ROW equ 4 0004 99 NUM_CDL equ 4 000001B8 01 100 REVERSE db 1h -------- 101 trans_data ends 102 103 assume ds:trans_data, es:trans_data 104 105 106 ; trans_code contains the procedures 107 ; for calculating matrix elements and 108 ; matrix multiplications 109 -------- 110 trans_code segment er public 111 112 ; create mnemonics for fsincos which is not 113 ; yet available from ASM386 as of now 114 C MACRO 115 codemacro fsincos # 116 dw 0fbd9h # 117 endm 118 00000000 119 trans_proc proc far 120 121 122 ; Calculate alpha and theta in radians 123 ; from their values in degrees 124 00000000 D9EB 125 fldpi 00000002 D835B4010000 R 126 fdiv d180 127 128 ; Duplicate pi/180 00000008 D9C0 129 fld st 130 0000000A DC0CCD80010000 R 131 fmul qword ptr ALPHA_DEG[ecx*8] 00000011 D9C9 132 fxch st(1) 00000013 DC0CCD88010000 R 133 fmul qword ptr THETA_DEG[ecx*8] 134 135 ; theta(radians) in ST and 136 ; alpha(radians) in ST(1) 137 138 ; Calculate matrix elements 139 ; a11 = cos theta 140 ; a12 = - cos alpha * sin theta 141 ; a13 = sin alpha * sin theta 142 ; a14 = A * cos theta 143 ; a21 = sin theta 144 ; a22 = cos alpha * cos theta 145 ; a23 = -sin alpha * cos theta 146 ; a24 = A * sin theta 147 ; a32 = sin alpha 148 ; a33 = cos alpha 149 ; a34 = D 150 ; a31 = a41 = a42 = a43 = 0.0 151 ; a44 =1 152 153 ; ebx contains the offset for the matrix 154 0000001A D9FB 155 fsincos ;cos theta in ST 156 ;sin theta in ST(1) 0000001C D9C0 157 fld st ;duplicate cos theta 0000001E DD13 158 fst [ebx].a11 ;cos theta in a11 00000020 DC0CCD90010000 R 159 fmul qword ptr A_VECTOR[ecx*8] 00000027 DD5B18 160 fstp [ebx].a14 ;A * cos theta in a14 0000002A D9C9 161 fxch st(1) ;sin theta in ST 0000002C DD5320 162 fst [ebx].a21 ;sin theta in a21 0000002F D9C0 163 fld st ;duplicate sin theta 00000031 DC0CCD90010000 R 164 fmul qword ptr A_VECTOR[ecx*8] 00000038 DD5B38 165 fstp [ebx].a24 ;A * sin theta in a24 0000003B D9C2 166 fld st(2) ;alpha in ST 0000003D D9FB 167 fsincos ;cos alpha in ST 168 ;sin alpha in ST(1) 169 ;sin theta in ST(2) 170 ;cos theta in ST(3) 0000003F DD5350 171 fst [ebx].a33 ;cos alpha in a33 00000042 D9C9 172 fxch st(1) ;sin alpha in ST 00000044 DD5348 173 fat [ebx].a32 ;sin alpha in a32 00000047 D9C2 174 fld ST(2) ;sin theta in ST 175 ;sin alpha in ST(1) 00000049 D8C9 176 fmul st,st(1) ;sin alpha * sin theta 0000004B DD5B10 177 fstp [ebx].a13 ;stored in a13 0000004E D8CB 178 fmul st,st(3) ;cos theta * sin alpha 00000050 D9E0 179 fchs ;-cos theta * sin alpha 00000052 DD5B30 180 fstp [ebx].a23 ;stored in a23 00000055 D9C2 181 fld st(2) ;cos theta in ST 182 ;cos alpha in ST(1) 183 ;sin theta in ST(2) 184 ;cos theta in ST(3) 00000057 D8C9 185 fmul st,st(1) ;cos theta * cos alpha 00000059 DD5B28 186 fstp [ebx].a22 ;stored in a22 0000005C D8C9 187 fmul st,st(1) ;cos alpha * sin theta 188 ; 189 ; To take advantage of parallel operations 190 ; between the CPU and NPX 191 ; 0000005E 50 192 push eax ; save eax 193 ; 194 ; also move D into a34 in a faster way 0000005F 8B04CDA0010000 R 195 mov eax, dword ptr D_VECTOR[ecx*8] 00000066 894358 196 mov dword ptr [ebx + 88], eax 00000069 8B04CDA4010000 R 197 mov eax, dword ptr D VECTOR[ecx*8 + 4] 00000070 89435C 198 mov dword ptr [ebx + 92], eax 00000073 58 199 pop eax ; restore eax 00000074 D9E0 200 fchs ;-cos alpha * sin theta 00000076 DD5B08 201 fstp [ebx].a12 ;stored in a12 202 ;and all nonzero elements 203 ;have been calculated 00000079 CB 204 rat 205 0000007A 206 trans_proc endp 207 208 0000007A 209 matrix_elem proc far 210 211 ; This procedure calculate the dot product 212 ; of the ith row of the first matrix and 213 ; the jth column of the second matrix: 214 ; 215 ; Tij where Tij = sum of Aik x Bkj over k 216 ; 217 ; parameters passed from the calling routine, 218 ; matrix_row: 219 ; ESI = (i-1)*8 220 ; EDI = (j-1)*8 221 ; local register, EBP = (k-1)*8 222 ; 0000007A 55 223 push ebp ; save ebp 0000007B 51 224 push ecx ; ecx to be used as a tmp reg 0000007C 8BCE 225 mov ecx, esi; save it for later indexing 226 227 ; locating the element in the first matrix, A 0000007E 6BC904 228 imul ecx, NUM_COL ; ecx contains offset due 229 ; to preceding rows; the 230 ; offset is from the 231 ; beginning of the matrix 232 00000081 31ED 233 xor ebp, ebp; clear ebp, which will be 234 ; used a temp reg to index( k) 235 ; across the ith row of the first 236 ; matrix as well as down the jth 237 ; column of the second matrix 238 239 ; clear Tij for accumulating Aik*Bkj 00000083 892C39 240 mov dword ptr [ecx][edi],ebp 00000086 896C3904 241 mov dword ptr [ecx][edi+4], ebp 242 0000008A 51 243 push ecx ; save on stack: esi * num_col = 244 ; the offset of the beginning 245 ; of the ith row from the 246 ; beginning of the A matrix 247 0000008B 248 NXT_k: 0000008B 01E9 249 add ecx, ebp ; get to the kth column entry 250 ; of the ith row of the A matrix 251 252 ; load AiK into 80387 0000008D DD0408 253 fld qword ptr [eax][ecx] 254 255 ; locating Bkj 00000090 8BCD 256 mov ecx, ebp 00000092 6BC904 257 imul ecx, NUM_ROW ; ecx contains the offset 258 ; of the beginning of the 259 ; kth row from the 260 ; beginning of the B matrix 00000095 01F9 261 add ecx, edi ; get to the jth column entry 262 ; of the kth row of the B 263 ; matrix 00000097 DC0C0B 264 fmul qword ptr [ebx][ecx]; Aik * Bkj 0000009A 59 265 pop ecx ; esi * num_col 266 ; in ecx again 0000009B 51 267 push ecx ; also at top of program 268 ; stack 269 270 ; add to the result in the output matrix, Tij 0000009C 01F9 271 add ecx, edi 272 273 ; accumulating the sum of Aik * Bkj 0000009E DC040A 274 fadd qword ptr [edx][ecx] 000000A1 DD1C0A 275 fstp qword ptr [edx][ecx] 276 ; increment k by 1, i.e., ebp by 8 000000A4 83C508 277 add ebp, 8 278 279 ; Has k reached the width of the matrix yet? 000000A7 83FD20 280 cmp ebp, NUM_COL*8 000000AA 7CDF 281 jl NXT_k 282 283 ; Restore registers 000000AC 59 284 pop ecx ; clear esi*num_col from stack 000000AD 59 285 pop ecx ; restore ecx 000000AE 5D 286 pop ebp ; restore ebp 000000AF CB 287 ret 288 000000B0 289 matrix_elem endp 290 291 000000B0 292 matrix_row proc far 293 000000B0 31FF 294 xor edi, edi 295 ; scan across a row 296 000000B2 297 NXT_COL: 000000B2 9A7A000000.... R 298 call matrix_elem 000000B9 83C708 299 add edi, 8 000000BC 83FF20 300 cmp edi, NUM_COL*8 000000BF 7CF1 301 jl NXT_COL 000000C1 CB 302 ret 303 000000C2 304 matrix_row endp 305 306 000000C2 307 matrixmul_proc proc far 308 309 ; This procedure does the matrix 310 ; multiplication by calling matrix_row 311 ; to calculate entries in each row 312 ; 313 ; The matrix multiplication is 314 ; performed in the following manner, 315 ; Tij = Aik x Bkj 316 ; where i and j denote the row and column 317 ; respectively and k is the index for 318 ; scanning across the ith row of the 319 ; first matrix and the jth column of the 320 ; second matrix. 000000C2 5A 321 pop edx ; offset Tmx in edx 000000C3 5B 322 pop ebx ; offset Bmx in ebx 000000C4 58 323 pop eax ; offset Amx in eax 324 325 ; setup esi and edi 326 ; edi points to the column 327 ; eai points to the row 328 000000C5 31F6 329 xor esi, esi ; clear esi 330 000000C7 331 NXT_ROW: 000000C7 9AB0000000---- R 332 call matrix_row 000000CE 83C608 333 add esi, 8 000000D1 83FE20 334 cmp esi, NUM_ROW*8 000000D4 7CF1 335 jl NXT_ROW 000000D6 CB 336 ret 337 000000D7 338 matrixmul_proc endp 339 340 -------- 341 trans_code ends 342 343 ;*************************************** 344 ; ; 345 ; ; 346 ; ; 347 ; Main program ; 348 ; ; 349 ; ; 350 ; ; 351 ;*************************************** 352 -------- 353 main_code segment er 354 00000000 355 START: 356 00000000 BC00000000 R 357 mov esp, stackstart trans_stack 358 ; save all registers 359 00000005 60 360 pushed 361 362 ; ECX denotes the number of joints 363 ; where no of matrices = NUM_JOINT + 1 364 ; Find the first matrix( from the base 365 ; of the system to the first joint) 366 ; and call it Bmx 00000006 31C9 367 xor ecx, ecx ; 1st matrix 00000008 BB80000000 R 368 mov ebx, offset Bmx ; 0000000D 9A00000000---- R 369 call trans_proc ; is Bmx 00000014 41 370 inc ecx 371 00000015 372 NXT MATRIX: 373 ; From the 2nd matrix and on, it 374 ; will be stored in Amx. 375 ; The result from the first matrix mult. 376 ; is stored in Tmx but will be accessed 377 ; as Bmx in the next multiplication. 378 ; As a matter of fact, the roles of Bmx 379 ; and Tmx alternate in successive 380 ; multiplications. This is achieved by 381 ; reversing the order of the Bmx and Tmx 382 ; pointers being passed onto the program 383 ; stack: Thus, this is invisible to the 384 ; matrix multiplication procedure. 385 ; REVERSE serves as the indicator; 386 ; REVERSE = 0 means that the result 387 ; is to placed in Tmx. 388 00000015 BB00000000 R 389 mov ebx, offset Amx ;find Amx 0000001A 9A00000000---- R 390 call trans_proc 00000021 41 391 inc ecx 00000022 8035B801000001 R 392 xor REVERSE, 1h 00000029 7511 393 jnz Bmx_as_Tmx 394 395 ; no reversing. Bmx as the second input 396 ; matrix while Tmx as the output matrix. 0000002B 6800000000 R 397 push offset Amx 00000030 6880000000 R 398 push offset Bmx 00000035 6800010000 R 399 push offset Tmx 0000003A EB0F 400 jmp CONTINUE 481 402 ; reversing. Tmx as the second input 403 ; matrix while Bmx as the output matrix. 0000003C 404 Bmx_as_Tmx: 0000003C 6800000000 R 405 push offset Amx 00000041 6800010000 R 406 push offset Tmx ;reversing the 00000046 6880000000 R 407 push offset Bmx ;pointers passed 408 UUUUUU4B 409 CONTINUE: 0000004B 9AC2000000---- R 410 call matrixmul_proc 00000052 83F901 411 cmp ecx, NUM_JOINT 00000055 7EBE 412 jle NXT_MATRIX 413 414 ; if REVERSE = 1 then the final answer 415 ; will be in Bmx otherwise, in Tmx. 416 00000057 61 417 popad 418 -------- 419 main_code ends 420 421 end START, ds:trans data, ss:trans stack ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS. Appendix A Machine Instruction Encoding and Decoding ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ ’‘‘1st Byte‘‘“ Hex Binary 2nd Byte Bytes 3-7 ASM386 Instruction Format D8 1101 1000 MOD 000 R/M SIB, displ FADD single-real D8 1101 1000 MOD 001 R/M SIB, displ FMUL single-real D8 1101 1000 MOD 010 R/M SIB, displ FCOM single-real D8 1101 1000 MOD 011 R/M SIB, displ FCOMP single-real D8 1101 1000 MOD 100 R/M SIB, displ FSUB single-real D8 1101 1000 MOD 101 R/M SIB, displ FSUBR single-real D8 1101 1000 MOD 110 R/M SIB, displ FDIV single-real D8 1101 1000 MOD 111 R/M SIB, displ FDIVR single-real D8 1101 1000 1100 0 REG FADD ST,ST(i) D8 1101 1000 1100 1 REG FMUL ST,ST(i) D8 1101 1000 1101 0 REG FCOM ST(i) D8 1101 1000 1101 1 REG FCOMP ST(i) D8 1101 1000 1110 0 REG FSUB ST,ST(i) D8 1101 1000 1110 1 REG FSUBR ST,ST(i) D8 1101 1000 1111 0 REG FDIV ST,ST(i) D8 1101 1000 1111 1 REG FDIVR ST,ST(i) D9 1101 1001 MOD 000 R/M SIB, displ FLD single-real D9 1101 1001 MOD 001 R/M reserved D9 1101 1001 MOD 010 R/M SIB, displ FST single-real D9 1101 1001 MOD 011 R/M SIB, displ FSTP single-real D9 1101 1001 MOD 100 R/M SIB, displ FLDENV 14 or 28 bytes The size of operand transferred depends on the 80386 operand-size attribute in effect for the instruction. D9 1101 1001 MOD 101 R/M SIB, displ FLDCW 2 bytes D9 1101 1001 MOD 110 R/M SIB, displ FSTENV 14 or 28 bytes The size of operand transferred depends on the 80386 operand-size attribute in effect for the instruction. D9 1101 1001 MOD 111 R/M SIB, displ FSTCW 2 bytes D9 1101 1001 1100 0 REG FLD ST(i) D9 1101 1001 1100 1 REG FXCH ST(i) D9 1101 1001 1101 0000 FNOP D9 1101 1001 1101 0001 reserved D9 1101 1001 1101 001- reserved D9 1101 1001 1101 01-- reserved D9 1101 1001 1101 1 REG reserved D9 1101 1001 1110 0000 FCHS D9 1101 1001 1110 0001 FABS D9 1101 1001 1110 001- reserved D9 1101 1001 1110 0100 FTST D9 1101 1001 1110 0101 FXAM D9 1101 1001 1110 011- reserved D9 1101 1001 1110 1000 FLD1 D9 1101 1001 1110 1001 FLDL2T D9 1101 1001 1110 1010 FLDL2E D9 1101 1001 1110 1011 FLDPI D9 1101 1001 1110 1100 FLDLG2 D9 1101 1001 1110 1101 FLDLN2 D9 1101 1001 1110 1110 FLDZ D9 1101 1001 1110 1111 reserved D9 1101 1001 1111 0000 F2XM1 D9 1101 1001 1111 0001 FYL2X D9 1101 1001 1111 0010 FPTAN D9 1101 1001 1111 0011 FPATAN D9 1101 1001 1111 0100 FXTRACT D9 1101 1001 1111 0101 FPREM1 D9 1101 1001 1111 0110 FDECSTP D9 1101 1001 1111 0111 FINCSTP D9 1101 1001 1111 1000 FPREM D9 1101 1001 1111 1001 FYL2XP1 D9 1101 1001 1111 1010 FSQRT D9 1101 1001 1111 1011 FSINCOS D9 1101 1001 1111 1100 FRNDINT D9 1101 1001 1111 1101 FSCALE D9 1101 1001 1111 1110 FSIN D9 1101 1001 1111 1111 FCOS DA 1101 1010 MOD 000 R/M SIB, displ FIADD short-integer DA 1101 1010 MOD 001 R/M SIB, displ FIMUL short-integer DA 1101 1010 MOD 010 R/M SIB, displ FICOM short-integer DA 1101 1010 MOD 011 R/M SIB, displ FICOMP short-integer DA 1101 1010 MOD 100 R/M SIB, displ FISUB short-integer DA 1101 1010 MOD 101 R/M SIB, displ FISUBR short-integer DA 1101 1010 MOD 110 R/M SIB, displ FIDIV short-integer DA 1101 1010 MOD 111 R/M SIB, displ FIDIVR short-integer DA 1101 1010 110- ---- reserved DA 1101 1010 1110 0--- reserved DA 1101 1010 1110 1000 reserved DA 1010 1010 1110 1001 FUCOMPP DA 1101 1010 1110 101- reserved DA 1101 1010 1110 11-- reserved DA 1101 1010 1111 ---- reserved DB 1101 1011 MOD 000 R/M SIB, displ FILD short-integer DB 1101 1011 MOD 001 R/M SIB, displ reserved DB 1101 1011 MOD 010 R/M SIB, displ FIST short-integer DB 1101 1011 MOD 011 R/M SIB, displ FISTP short-integer DB 1101 1011 MOD 100 R/M SIB, displ reserved DB 1101 1011 MOD 101 R/M SIB, displ FLD extended-real DB 1101 1011 MOD 110 R/M SIB, displ reserved DB 1101 1011 MOD 111 R/M SIB, displ FSTP extended-real DB 1101 1011 110- ---- reserved DB 1101 1011 1110 0000 This encoding can be generated by the language translators; however, the 80387 treats it as FNOP. It corresponds to the following 8087 or 80287 instructions: FENI. DB 1101 1011 1110 0001 This encoding can be generated by the language translators; however, the 80387 treats it as FNOP. It corresponds to the following 8087 or 80287 instructions: FEDISI. DB 1101 1011 1110 0010 FCLEX DB 1101 1011 1110 0011 FINIT DB 1101 1011 1110 0100 This encoding can be generated by the language translators; however, the 80387 treats it as FNOP. It corresponds to the following 8087 or 80287 instructions: FSETPM. DB 1101 1011 1110 0101 reserved DB 1101 1011 1110 011- reserved DB 1101 1011 1110 1--- reserved DB 1101 1011 1111 ---- reserved DC 1101 1100 MOD 000 R/M SIB, displ FADD double-real DC 1101 1100 MOD 001 R/M SIB, displ FMUL double-real DC 1101 1100 MOD 010 R/M SIB, displ FCOM double-real DC 1101 1100 MOD 011 R/M SIB, displ FCOMP double-real DC 1101 1100 MOD 100 R/M SIB, displ FSUB double-real DC 1101 1100 MOD 101 R/M SIB, displ FSUBR double-real DC 1101 1100 MOD 110 R/M SIB, displ FDIV double-real DC 1101 1100 MOD 111 R/M SIB, displ FDIVR double-real DC 1101 1100 1100 0 REG FADD ST(i),ST DC 1101 1100 1100 1 REG FMUL ST(i),ST DC 1101 1100 1101 0 REG reserved DC 1101 100 1101 1 REG reserved DC 1101 1100 1110 0 REG FSUBR ST(i),ST DC 1101 1100 1110 1 REG FSUB ST(i),ST DC 1101 1100 1111 0 REG FDIVR ST(i),ST DC 1101 1100 1111 1 REG FDIV ST(i),ST DD 1101 1101 MOD 000 R/M SIB, displ FLD double-real DD 1101 1101 MOD 001 R/M reserved DD 1101 1101 MOD 010 R/M SIB, displ FST double-real DD 1101 1101 MOD 011 R/M SIB, displ FSTP double-real DD 1101 1101 MOD 100 R/M SIB, displ FRSTOR 94 or 108 bytes The size of operand transferred depends on the 80386 operand-size attribute in effect for the instruction. DD 1101 1101 MOD 101 R/M SIB, displ reserved DD 1101 1101 MOD 110 R/M SIB, displ FSAVE 94 or 108 bytes The size of operand transferred depends on the 80386 operand-size attribute in effect for the instruction. DD 1101 1101 MOD 111 R/M SIB, displ FSTSW 2 bytes DD 1101 1101 1100 0 REG FFREE ST(i) DD 1101 1101 1100 1 REG reserved DD 1101 1101 1101 0 REG FST ST(i) DD 1101 1101 1101 1 REG FSTP ST(i) DD 1101 1101 1110 0 REG FUCOM ST(i) DD 1101 1101 1110 1 REG FUCOMP ST(i) DD 1101 1101 1111 ---- reserved DE 1101 1110 MOD 000 R/M SIB, displ FIADD word-integer DE 1101 1110 MOD 001 R/M SIB, displ FIMUL word-integer DE 1101 1110 MOD 010 R/M SIB, displ FICOM word-integer DE 1101 1110 MOD 011 R/M SIB, displ FICOMP word-integer DE 1101 1110 MOD 100 R/M SIB, displ FISUB word-integer DE 1101 1110 MOD 101 R/M SIB, displ FISUBR word-integer DE 1101 1110 MOD 110 R/M SIB, displ FIDIV word-integer DE 1101 1110 MOD 111 R/M SIB, displ FIDIVR word-integer DE 1101 1110 1100 0 REG FADDP ST(i),ST DE 1101 1110 1100 1 REG FMULP ST(i),ST DE 1101 1110 1101 0--- reserved DE 1101 1110 1101 1000 reserved DE 1101 1110 1101 1001 FCOMPP DE 1101 1110 1101 101- reserved DE 1101 1110 1101 11-- reserved DE 1101 1110 1110 0 REG FSUBRP ST(i),ST DE 1101 1110 1110 1 REG FSUBP ST(i),ST DE 1101 1110 1111 0 REG FDIVRP ST(i),ST DE 1101 1110 1111 1 REG FDIVP ST(i),ST DF 1101 1111 MOD 000 R/M SIB, displ FILD word-integer DF 1101 1111 MOD 001 R/M SIB, displ reserved DF 1101 1111 MOD 010 R/M SIB, displ FIST word-integer DF 1101 1111 MOD 011 R/M SIB, displ FISTP word-integer DF 1101 1111 MOD 100 R/M SIB, displ FBLD packed-decimal DF 1101 1111 MOD 101 R/M SIB, displ FILD long-integer DF 1101 1111 MOD 110 R/M SIB, displ FBSTP packed-decimal DF 1101 1111 MOD 111 R/M SIB, displ FISTP long-integer DF 1101 1111 1100 0 REG reserved DF 1101 1111 1100 1 REG reserved DF 1101 1111 1101 0 REG reserved DF 1101 1111 1101 1 REG reserved DF 1101 1111 1110 0000 FSTSW AX DF 1101 1111 1110 0001 reserved DF 1101 1111 1110 001- reserved DF 1101 1111 1110 01-- reserved DF 1101 1111 1110 1--- reserved DF 1101 1111 1111 ---- reserved Appendix B Exception Summary ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ The following table lists the instruction mnemonics in alphabetical order. For each mnemonic, it summarizes the exceptions that the instruction may cause. When writing 80387 programs that may be used in an environment that employs numerics exception handlers, assembly-language programmers should be aware of the possible exceptions for each instruction in order to determine the need for exception synchronization. Chapter 4 explains the need for exception synchronization. Mnemonic Instruction IS IS‘‘Invalid operand due to stack overflow/underflow I I‘‘Invalid operand due to other cause D D‘‘Denormal operand Z Z‘‘Zero-divide O O‘‘Overflow U U‘‘Underflow P P‘‘Inexact result (precision) F2XM1 2^(X) - 1 Y Y Y Y Y FABS Absolute value Y FADD(P) Add real Y Y Y Y Y Y FBLD BCD load Y FBSTP BCD store and pop Y Y Y FCHS Change sign Y FCLEX Clear exceptions FCOM(P)(P) Compare real Y Y Y FCOS Cosine Y Y Y Y Y FDECSTP Decrement stack pointer FDIV(R)(P) Divide real Y Y Y Y Y Y Y FFREE Free register FIADD Integer add Y Y Y Y Y Y FICOM(P) Integer compare Y Y Y FIDIV Integer divide Y Y Y Y Y Y FIDIVR Integer divide reversed Y Y Y Y Y Y Y FILD Integer load Y FIMUL Integer multiply Y Y Y Y Y Y FINCSTP Increment stack pointer FINIT Initialize processor FIST(P) Integer store Y Y Y FISUB(R) Integer subtract Y Y Y Y Y Y FLD extended or stack Load real Y FLD single or double Load real Y Y Y FLD1 Load + 1.0 Y FLDCW Load Control word Y Y Y Y Y Y Y FLDENV Load environment Y Y Y Y Y Y Y FLDL2E Load log{2}e Y FLDL2T Load log{2}10 Y FLDLG2 Load log{10}2 Y FLDLN2 Load log{e}2 Y FLDPI Load Ò Y FLDZ Load + 0.0 Y FMUL(P) Multiply real Y Y Y Y Y Y FNOP No operation FPATAN Partial arctangent Y Y Y Y Y FPREM Partial remainder Y Y Y Y FPREM1 IEEE partial remainder Y Y Y Y FPTAN Partial tangent Y Y Y Y Y FRNDINT Round to integer Y Y Y Y FRSTOR Restore state Y Y Y Y Y Y Y FSAVE Save state FSCALE Scale Y Y Y Y Y Y FSIN Sine Y Y Y Y Y FSINCOS Sine and cosine Y Y Y Y Y FSQRT Square root Y Y Y Y FST(P) stack or extended Store real Y FST(P) single or double Store real Y Y Y Y Y Y FSTCW Store control word FSTENV Store Environment FSTSW (AX) Store status word FSUB(R)(P) Subtract real Y Y Y Y Y Y FTST Test Y Y Y FUCOM(P)(P) Unordered compare real Y Y Y FWAIT CPU Wait FXAM Examine FXCH Exchange registers Y FXTRACT Extract Y Y Y Y FYL2X Y * log{2}X Y Y Y Y Y Y Y FYL2XP1 Y * log{2}(X + 1) Y Y Y Y Y Appendix C Compatibility Between the 80387 and the 80287/8087 ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ This appendix summarizes the differences between the 80387 and its predecessors the 80287 and the 8087, and analyzes the impact of these differences on software that must be transported from the 80287 or 8087 to the 80387. Any migration from the 8087 directly to the 80387 must also take into account the additional differences between the 8087 and the 80387 as listed in Appendix D of this manual. ’‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘Difference Description‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘“ Issue 80387 Behavior 8087/80287 Behavior Impact on Software Reason for the Difference C.1 INITIALIZATION SEQUENCE RESET, After a hardware RESET, No difference between 80387 initialization Permits the 80386 to FINIT, the ERROR# output is RESET and FINIT. software must execute an differentiate between the 80287 and asserted to indicate that an FNINIT instruction to clear and the 80387. ERROR# 80387 is present. To ERROR#. The FNINIT is PIN accomplish this, the IE and not required for 80287/8087 ES bits of the status word software, though Intel are set, and the IM bit in documentation the control word is reset. recommends its use (refer to After FINIT, the status the Numerics Supplement to word and the control word the iAPX 286 Programmer's have the same values as in Reference Manual.) an 80287/8087 after RESET. C.2 DATA TYPES AND EXCEPTION HANDLING NaN The 80387 distinguishes The 80287/8087 only Uninitialized memory IEEE Standard 754 between signaling NaNs generates one kind of NaN locations that contain compatibility. and quiet NaNs. The 80387 (the equivalent of a quiet QNaNs should be changed only generates quiet NaNs. NaN) but raises an to SNaNs to cause the An invalid-operation invalid-operation exception 80387 to fault when exception is raised only upon encountering any kind uninitialized memory locations upon encountering a of NaN. are referenced. signaling NaN (except for FCOM, FIST, and FBSTP which also raise IE for quiet NaNs). Pseudozero, The 80387 neither The 80287/8087 defines None. The 80387 does not IEEE Standard 754 Pseudo-NaN, generates not supports these and supports special generate these formats, compatibility. Pseudoinfinity, formats; it raises an handling for these formats. and therefore will not and Unnormal invalid-operation exception encounter them unless a Formats whenever it encounters programmer deliberately them in an arithmetic enters them. operation. Tag Word The encoding in the tag The encoding for pseudo- The exception handler may IEEE Standard 754 Bits for word for the unsupported zero and unnormal is need to be changed if compatibility. Unsupported data formats mentioned in "valid" (type 00); the programmers use such Data Section C.2.2 is "special others are"special data" data types. Formats data" (type 10). (type 10). Invalid- No invalid-operation Upon encountering a None. Software on the Upgrade, to eliminate Operation exception is raised upon denormal in FSQRT, FDIV, 80387 will continue to exception. Exception encountering a denormal in or FPREM or upon execute in cases where the FSQRT, FDIV, or FPREM conversion to BCD or to 80287/8087 would trap. or upon conversion to integer, the invalid- BCD or to integer. The operation exception is operation proceeds by first raised. normalizing the value. Denormal The denormal exception is The denormal exception is The exception handler Performance enhancement Exception raised in transcendental not raised in transcendental needs to be changed only for normal case. instructions and FXTRACT. instructions and FXTRACT. if it gives special treatment to different opcodes. Overflow Overflow exception Overflow exception Overflow exception IEEE Standard 754 Exception masked. masked. masked. compatibility. If the rounding mode is set The 80287/8087 does not Under the most common to chop (toward zero), the signal the overflow rounding modes, no result is the most positive exception when the masked impact. If rounding is or most negative number. response is not infinity; toward zero (chop), a i.e., it signals overflow program on the 80387 only when the rounding produces under overflow control is not set to round conditions a result that is to zero .If rounding is set different in the least to chop (toward zero), the significant bit of the result is positive or significand, compared to negative infinity. the result on the 80287. Overflow exception not Overflow exception not Overflow exception not masked. masked. masked. The precision exception is The precision exception is If the result is stored on flagged. When the result is not flagged and the the stack, a program on stored in the stack, the significand is not rounded. the 80387 produces a significand is rounded different result under according to the precision overflow conditions than control (PC) bit of the on the 80287/8087. The control word of according difference is apparent only to the opcode. to the exception handler. Underflow Conditions for underflow. Conditions for underflow. Underflow exception IEEE Standard 754 Exception When the underflow When the underflow masked. compatibility. exception is masked, the exception is masked and No impact. The underflow Two related underflow exception is rounding is toward zero, the exception occurs less events signaled when both the underflow exception flag is often when rounding is contribute to result is tiny and raised on tininess, toward zero. underflow: denormalization results regardless of loss of in a loss of accuracy. accuracy. 1. The creation tiny result. Response to underflow. Response to underflow. Underflow exception not A tiny When the underflow When the underflow masked. number, exception is unmasked exception is not masked and A program on the 80387 because it and the instruction is the destination is the produces a different result is so small, supposed to store the stack, the significand is during underflow may cause result on the stack, the not rounded but rather is conditions than on the 80287/ some other significand is rounded to left as is. 8087 if the result is exception the appropriate precision stored on the stack. The later (such (according to the precision difference is only in the as overflow control (PC) bit of the least significant bit of the upon control word, for those significand and is apparent division). instructions controlled by only to the exception handler. PC, otherwise to extended 2. Loss of precision). accuracy during the denormaliza- tion of a tiny number. Which of these events triggers the underflow exception depends on whether the underflow exception is masked. Which of these events triggers the underflow exception depends on whether the underflow exception is masked. Exception There is no difference in When the denormal None, but some unneeded Operational improvement. Precedence the precedence of the exception is not masked, normalization of denormal denormal exception, it takes precedence over operands is prevented on whether it be masked or all other exceptions. the 80387. not. C.3 TAG, STATUS, AND CONTROL WORDS Bits C3-C0 of After FINIT, incomplete After FINIT, incomplete None. Upgrade, to provide Status Word FPREM, and hardware FPREM, and hardware consistent state after reset. reset, the 80387 sets these reset, the 80287/8087 bits to zero. leaves these bits intact (they contain the prior value). Bit C2 of Bit 10 (C2) serves as an This bit is undefined for None. Programs don't Upgrade to allow fast Status Word incomplete bit for FPTAN. FPTAN. check C2 after FPTAN. checking of operand range. Infinity Only affine closure is Both affine and projective Software that requires IEEE Standard 754 Control supported. Bit 12 remains closures are supported. projective infinity compatibility. programmable but has no After RESET, the default arithmetic may give effect on 80387 operation. value in the control word is different results. projective. Status Word When an invalid-operation When an invalid-operation None. Existing exception Upgrade and performance Bit 6 for exception occurs due to exception occurs due to handlers need not change, improvement. Stack Fault stack overflow or stack overflow or underflow, but may be upgraded to underflow, not only is bit 0 only bit 0 (IE) of the take advantage of the (IE) the status word set, but status word is set. Bit 6 is additional information. also bit 6 is set to indicate RESERVED. Newly written handlers will a stack fault and bit 9 (C1) be more effective. specifies overflow or underflow. Bit 6 is called SF and serves to distinguish invalid exceptions caused by stack overflow/underflow from those caused by numeric operations. Tag Word When loading the tag word The corresponding tag is Software may not operate Performance improvement. with an FLDENV or checked before each correctly if it uses FLDENV FRSTOR instruction, the register access to determine or FRSTOR to change tags only interpretations of tag the class of operand in the to values (other than values used by the 80387 register; the tag is updated empty) that are different are empty (value 11) and after every change to a from actual register Nonempty (values 00, 01, register so that the tag contents. and 10). Subsequent always reflects the most operations on a nonempty recent status of the register always examine register. Programmers can the value in the register, load a tag with a value that not the value in its tag. disagrees with the contents The FSTENV and FSAVE of a register (for example, instructions examine the the register contains valid nonempty registers and contents, but the tag says put the correct values in special; the 80287/8087, in the tags before storing the this case, honors the tag tag word. and does not examine the register). C.4 INSTRUCTION SET FBSTP, FDIV, Operation on denormal Operation on denormal The exception handler for IEEE Standard 754 FIST(P), FPREM, operand is supported. An operand raises underflow may require compatibility. FSQRT underflow exception can invalid-operation exception. change only if it gives occur. Underflow is not possible. different treatment to different opcodes. Possibly fewer invalid-operation exceptions will occur. FSCALE The range of the scaling The range of the scaling Different result when Upgrade. operand is not restricted. operand is retricted. If 0 < 0 < ST(1) < 1. If 0 < ST(1) < 1, the ST(1) < 1, the result is scaling factor is zero; undefined and no exception therefore, ST(0) remains is signaled. unchanged. If the rounded result is not exact or if there was a loss of accuracy (masked underflow), the precision exception is signaled. FPREM1 Performs partial remainder Does not exist. None. IEEE Standard 754 according to IEEE compatibility and upgrade. Standard 754 standard. FPREM Bits C0, C3, C1 of the The quotient bits are None. Software that works Upgrade. status word, correctly incorrect when performing a around the bug should not reflect the three low-order reduction of 64^(N) + M when be affected. bits of the quotient. N 1 and M=1 or M=2. FUCOM, FUCOMP, Perform unordered Do not exist. None. IEEE Standard 754 FUCOMPP compare according to compatibility. IEEE Standard 754 standard. FPTAN Range of operand is much Range of operand is None. Upgrade. less restricted (ST(0) < restricted (ST(0) < Ò/4); 2^(63)); reduces operand operand must be reduced internally using an internal to range using FPREM. Ò/4 constant that is more accurate. After a stack overflow After a stack overflow IEEE Standard 754 when the invalid-operation when the invalid-operation compatibility. exception is masked, both exception is masked, the ST and ST(1) contain quiet original operand remains NaNs. unchanged, but is pushed to ST(1). FSIN, FCOS, Perform three common Do not exist. None. Upgrade. FSINCOS trigonometric functions. FPATAN Range of operands is ST(0) must be smaller None. Upgrade. unrestricted. than ST(1). F2XM1 Wider range of operand The supported operand None. Upgrade. (-1 ¾ ST(0) ¾ +1). range is 0 ¾ ST(0) ¾ 0.5. FLD Does not report denormal Reports denormal exception. None. Upgrade. extended-real exception because the instruction is not arithmetic. FXTRACT If the operand is zero, the If the operand is zero, None. Software usually IEEE 754 recommendation zero-divide exception is ST(1) is zero and no bypasses zero and ý. to fully support the logb reported and ST(1) is -ý. exception is reported. If function. If the operand is +ý, no the operand is +ý, the exception is reported. invalid-operation exception is reported. FLD constant Rounding control is in Rounding control is not in Results are the same as IEEE 754 recommendation. effect. effect. for the 8087/80287 when rounding control is set to round to zero, round to -ý, and (in the case of FLDL2T) round to nearest. Results are different by one in the least significant bit of the significand in round to +ý and round to nearest (excluding FLDL2T). FLD1 and FLDZ are always the same. FLD Loading a denormal Loading a denormal causes If the next instruction is IEEE Standard 754 single/double causes the number to be the number to be converted FXTRACT or FXAM, the compatibility. precision converted to extended to an unnormal. 80387 will give a different precision (because it is put result than the 80287/8087. on the stack). FLD When loading a signaling Does not raise an The exception handler IEEE Standard 754 single/double NaN, raises invalid exception. exception when loading a need to be updated to compatibility. precision signaling NaN. handle this condition. FSETPM Treated as FNOP (no Informs the 80287 that the None. The 80386 handles all operation). system is in protected addressing and mode. exception-pointer information, whether in protected mode or not. FXAM When encountering an May generate these None. Upgrade, to provide empty register, the 80387 combinations, among others. repeatable results. will not generate combinations of C3-C0 equal to 1101 or 1111. All May generate different Round-up bit of status None. Upgrade, to signal Transcendental results in round-up bit of word is undefined for these rounding status. Instructions status word. instructions. Appendix D Compatibility Between the 80387 and the 8087 ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ The 80386/80387 operating in real-address mode will execute 8087 programs without major modification. However, because of differences in the handling of numeric exceptions between the 80387 NPX and the 8087 NPX, exception-handling routines may need to be changed. This appendix summarizes the additional differences between the 80387 NPX and the 8087 NPX (other than those already included in Appendix B), and provides details showing how 8087 programs can be ported to the 80387. 1. The 80387 signals exceptions through a dedicated ERROR# line to the 80386; no interrupt controller is needed for this purpose. The 8087 requires an interrupt controller (8259A) to interrupt the CPU when an unmasked exception occurs. Therefore, any interrupt-controller-oriented instructions in numeric exception handlers for the 8087 should be deleted. 2. The 8087 instructions FENI/FNENI and FDISI/FNDISI perform no useful function in the 80387. If the 80387 encounters one of these opcodes in its instruction stream, the instruction will effectively be ignored‘‘none of the 80387 internal states will be updated. While 8087 code containing these instructions may be executed on the 80387, it is unlikely that the exception-handling routines containing these instructions will be completely portable to the 80387. 3. In real mode and protected mode (not including virtual 8086 mode), interrupt vector 16 must point to the numeric exception handling routine. In virtual 8086 mode, the V86 monitor can be programmed to accommodate a different location of the interrupt vector for numeric exceptions. 4. The ESC instruction address saved in the 80386/80387 or 80386/80287 includes any leading prefixes before the ESC opcode. The corresponding address saved in the 8086/8087 does not include leading prefixes. 5. In protected mode (not including virtual 8086 mode), the format of the 80387's saved instruction and address pointers is different than for the 8087. The instruction opcode is not saved in protected mode‘‘exception handlers will have to retrieve the opcode from memory if needed. 6. Interrupt 7 will occur in the 80386 when executing ESC instructions with either TS (task switched) or EM (emulation) of the 80386 MSW set (TS=1 or EM=1). If TS is set, then a WAIT instruction will also cause interrupt 7. An exception handler should be included in 80387 code to handle these situations. 7. Interrupt 9 will occur if the second or subsequent words of a floating-point operand fall outside a segment's size. Interrupt 13 will occur if the starting address of a numeric operand falls outside a segment's size. An exception handler should be included to report these programming errors. 8. Except for the processor control instructions, all of the 80387 numeric instructions are automatically synchronized by the 80386 CPU‘‘the 80386 automatically waits until all operands have been transferred between the 80386 and the 80387 before executing the next ESC instruction. No explicit WAIT instructions are required to assure this synchronization. For the 8087 used with 8086 and 8088 processors, explicit WAITs are required before each numeric instruction to ensure synchronization. Although 8087 programs having explicit WAIT instructions will execute perfectly on the 80387 without reassembly, these WAIT instructions are unnecessary. 9. Since the 80387 does not require WAIT instructions before each numeric instruction, the ASM386 assembler does not automatically generate these WAIT instructions. The ASM86 assembler, however, automatically precedes every ESC instruction with a WAIT instruction. Although numeric routines generated using the ASM86 assembler will generally execute correctly on the 80386/20, reassembly using ASM386 may result in a more compact code image and faster execution. The processor control instructions for the 80387 may be coded using either a WAIT or No-WAIT form of mnemonic. The WAIT forms of these instructions cause ASM386 to precede the ESC instruction with a CPU WAIT instruction, in the identical manner as does ASM86. 10. The address of a memory operand stored by FSAVE or FSTENV is undefined if the previous ESC instruction did not refer to memory. 11. Because the 80387 automatically normalizes denormal numbers when possible, an 8087 program that uses the denormal exception solely to normalize denormal operands can run on an 80387 by masking the denormal exception. The 8087 denormal exception handler would not be used by the 80387 in this case. A numerics program runs faster when the 80387 performs normalization of denormal operands. A program can detect at run-time whether it is running on an 80387 or 8087/80287 and disable the denormal exception when an 80387 is used. Appendix E 80387 80-Bit CHMOS III Numeric Processor Extension For Advance Information on the Intel 80387 please consult Appendix E of the printed version of this book or the 80387 Data Sheet, order number 231920. Appendix F PC/AT-Compatible 80387 Connection ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ The PC/AT uses a nonstandard scheme to report 80287 exceptions to the 80286. When replicating the PC/AT coprocessor interface in 80386-based systems, the PC/AT interface cannot be used in exactly the same way; however, this appendix outlines a similar interface that works on 80386/80387 systems and maintains compatibility with the nonstandard PC/AT scheme. Note that the interface outlined here does not represent a new interface standard; it needs to be incorporated in AT-compatible designs only because the 80286 and 80287 in the PC/AT are not connected according to the standards defined by Intel. The standard 80386/80387 connection recommended by Intel in the 80387 Data Sheet functions properly; the 80386 implementation has not been and will not be altered. F.1 The PC/AT Interface In the PC/AT, the ERROR# input to the 80286 is tied inactive (high) permanently. The ERROR# output of the 80287 is tied to an interrupt port (IRQ13). This interrupt replaces exception signaling via the 80286's ERROR# input. To guarantee (in the case of an 80287 exception) that INTR 13 will be serviced prior to the execution of any further 80287 instructions, an edge-triggered flip-flop latches BUSY# using ERROR# as a clock. The output of this latch is ORed with the BUSY# output of the 80287 and drives the BUSY# input of the 80286. This PC/AT scheme effectively delays deactivation of BUSY# at the 80286 whenever an 80287 ERROR# is signaled. Since the 80286 BUSY# input remains active after an exception, the 80286 interrupt 13 handler is guaranteed to execute before any other 80287 instructions may begin. The interrupt 13 handler clears the BUSY# latch (via a write to a special I/O port), thus allowing execution of 80287 instructions to proceed. The interrupt 13 handler then branches to the NMI handler, where the user-defined numerics exception handler resides in PC-compatible systems. The use of an interrupt guarantees that an exception from a coprocessor instruction will be detected. Latching BUSY# guarantees that any coprocessor instruction (except FINIT, FSETPM, and FCLEX) following the instruction that raised the exception will not be executed before the NMI handler is executed. This PC/AT scheme approximates the exception reporting scheme between the 8087 and 8088 in the original PC. F.2 How to Achieve the Same Effect in an 80386 System The 80386 can use a PC/AT-compatible interface to communicate with an 80387 provided that, when an NPX exception occurs, BUSY# active time is extended and PEREQ is reactivated only after 80387 BUSY# has gone inactive. The 80387 is left active (tying STEN high) at all times. Also, the 80386 and 80387 must be reset by the same RESET signal. The reactivation of PEREQ for the 80386 is needed for store instructions (for example, FST mem) because the 80387 drops PEREQ once it signals an exception. While the 80386 has not yet recognized the occurrence of the exception, it still expects the data transfers to complete via PEREQ reactivation. It is permissible for the 80386 to receive undefined data during such I/O read cycles. Disabling the 80387 is not necessary, because the dummy data-transfer cycles directed to the 80387 when PEREQ is externally reactivated for the 80386 will not disturb the operation of the 80387. The interrupt 13 handler should remove the extension of BUSY# and reactivation of PEREQ via a write to PC/AT-compatible hardware at I/O port F0H. Glossary of 80387 and Floating-Point Terminology ‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘ This glossary defines many terms that have precise technical meanings as specified in the IEEE 754 Standard or as specified in this manual. Where these terms are used, they have been italicized to emphasize the precision of their meanings. Base (1) a term used in logarithms and exponentials. In both contexts, it is a number that is being raised to a power. The two equations (y = log base b of x) and (b^(y) = x) are the same. Base (2) a number that defines the representation being used for a string of digits. Base 2 is the binary representation; base 10 is the decimal representation; base 16 is the hexadecimal representation. In each case, the base is the factor of increased significance for each succeeding digit (working up from the bottom). Bias a constant that is added to the true exponent of a real number to obtain the exponent field of that number's floating-point representation in the 80387. To obtain the true exponent, you must subtract the bias from the given exponent. For example, the single real format has a bias of 127 whenever the given exponent is nonzero. If the 8-bit exponent field contains 10000011, which is 131, the true exponent is 131-127, or +4. Biased Exponent the exponent as it appears in a floating-point representation of a number. The biased exponent is interpreted as an unsigned, positive number. In the above example, 131 is the biased exponent. Binary Coded Decimal a method of storing numbers that retains a base 10 representation. Each decimal digit occupies 4 full bits (one hexadecimal digit). The hexadecimal values A through F (1010 through 1111) are not used. The 80387 supports a packed decimal format that consists of 9 bytes of binary coded decimal (18 decimal digits) and one sign byte. Binary Point an entity just like a decimal point, except that it exists in binary numbers. Each binary digit to the right of the binary point is multiplied by an increasing negative power of two. C3‘‘C0 the four "condition code" bits of the 80387 status word. These bits are set to certain values by the compare, test, examine, and remainder functions of the 80387. Characteristic a term used for some non-Intel computers, meaning the exponent field of a floating-point number. Chop to set one or more low-order bits of a real number to zero, yielding the nearest representable number in the direction of zero. Condition Code the four bits of the 80387 status word that indicate the results of the compare, test, examine, and remainder functions of the 80387. Control Word a 16-bit 80387 register that the user can set, to determine the modes of computation the 80387 will use and the exception interrupts that will be enabled. Denormal a special form of floating-point number. On the 80387, a denormal is defined as a number that has a biased exponent of zero. By providing a significand with leading zeros, the range of possible negative exponents can be extended by the number of bits in the significand. Each leading zero is a bit of lost accuracy, so the extended exponent range is obtained by reducing significance. Double Extended the Standard's term for the 80387's extended format, with more exponent and significand bits than the double format and an explicit integer bit in the significand. Double Format a floating-point format supported by the 80387 that consists of a sign, an 11-bit biased exponent, an implicit integer bit, and a 52-bit significand‘‘a total of 64 explicit bits. Environment the 14 or 28 (depending on addressing mode) bytes of 80387 registers affected by the FSTENV and FLDENV instructions. It encompasses the entire state of the 80387, except for the 8 registers of the 80387 stack. Included are the control word, status word, tag word, and the instruction, opcode, and operand information provided by interrupts. Exception any of the six conditions (invalid operand, denormal, numeric overflow, numeric underflow, zero-divide, and precision) detected by the 80387 that may be signaled by status flags or by traps. Exception Pointers The data maintained by the 80386 to help exception handlers identify the cause of an exception. This data consists of a pointer to the most recently executed ESC instruction and a pointer to the memory operand of this instruction, if it had a memory operand. An exception handler can use the FSTENV and FSAVE instructions to access these pointers. Exponent (1) any number that indicates the power to which another number is raised. Exponent (2) the field of a floating-point number that indicates the magnitude of the number. This would fall under the above more general definition (1), except that a bias sometimes needs to be subtracted to obtain the correct power. Extended Format the 80387's implementation of the Standard's double extended format. Extended format is the main floating-point format used by the 80387. It consists of a sign, a 15-bit biased exponent, and a significand with an explicit integer bit and 63 fractional-part bits. Floating-Point of or pertaining to a number that is expressed as base, a sign, a significand, and a signed exponent. The value of the number is the signed product of its significand and the base raised to the power of the exponent. Floating-point representations are more versatile than integer representations in two ways. First, they include fractions. Second, their exponent parts allow a much wider range of magnitude than possible with fixed-length integer representations. Gradual Underflow a method of handling the underflow error condition that minimizes the loss of accuracy in the result. If there is a denormal number that represents the correct result, that denormal is returned. Thus, digits are lost only to the extent of denormalization. Most computers return zero when underflow occurs, losing all significant digits. Implicit Integer Bit a part of the significand in the single real and double real formats that is not explicitly given. In these formats, the entire given significand is considered to be to the right of the binary point. A single implicit integer bit to the left of the binary point is always one, except in one case. When the exponent is the minimum (biased exponent is zero), the implicit integer bit is zero. Indefinite a special value that is returned by functions when the inputs are such that no other sensible answer is possible. For each floating-point format there exists one quiet NaN that is designated as the indefinite value. For binary integer formats, the negative number furthest from zero is often considered the indefinite value. For the 80387 packed decimal format, the indefinite value contains all 1's in the sign byte and the uppermost digits byte. Inexact The Standard's term for the 80387's precision exception. Infinity a value that has greater magnitude than any integer or any real number. It is often useful to consider infinity as another number, subject to special rules of arithmetic. All three Intel floating-point formats provide representations for +ý and -ý. Integer a number (positive, negative, or zero) that is finite and has no fractional part. Integer can also mean the computer representation for such a number: a sequence of data bytes, interpreted in a standard way. It is perfectly reasonable for integers to be represented in a floating-point format; this is what the 80387 does whenever an integer is pushed onto the 80387 stack. Integer Bit a part of the significand in floating-point formats. In these formats, the integer bit is the only part of the significand considered to be to the left of the binary point. The integer bit is always one, except in one case: when the exponent is the minimum (biased exponent is zero), the integer bit is zero. In the extended format the integer bit is explicit; in the single format and double format the integer bit is implicit; i.e., it is not actually stored in memory. Invalid Operation the exception condition for the 80387 that covers all cases not covered by other exceptions. Included are 80387 stack overflow and underflow, NaN inputs, illegal infinite inputs, out-of-range inputs, and inputs in unsupported formats. Long Integer an integer format supported by the 80387 that consists of a 64-bit two's complement quantity. Long Real an older term for the 80387's 64-bit double format. Mantissa a term used with some non-Intel computers for the significand of a floating-point number. Masked a term that applies to each of the six 80387 exceptions I,D,Z,O,U,P. An exception is masked if a corresponding bit in the 80387 control word is set to one. If an exception is masked, the 80387 will not generate an interrupt when the exception condition occurs; it will instead provide its own exception recovery. Mode One of the status word fields "rounding control" and "precision control" which programs can set, sense, save, and restore to control the execution of subsequent arithmetic operations. NaN an abbreviation for "Not a Number"; a floating-point quantity that does not represent any numeric or infinite quantity. NaNs should be returned by functions that encounter serious errors. If created during a sequence of calculations, they are transmitted to the final answer and can contain information about where the error occurred. Normal the representation of a number in a floating-point format in which the significand has an integer bit one (either explicit or implicit). Normalize convert a denormal representation of a number to a normal representation. NPX Numeric Processor Extension. This is the 80387, 80287, or 8087. Overflow an exception condition in which the correct answer is finite, but has magnitude too great to be represented in the destination format. This kind of overflow (also called numeric overflow) is not to be confused with stack overflow. Packed Decimal an integer format supported by the 80387. A packed decimal number is a 10-byte quantity, with nine bytes of 18 binary coded decimal digits and one byte for the sign. Pop to remove from a stack the last item that was placed on the stack. Precision The effective number of bits in the significand of the floating-point representation of a number. Precision Control an option, programmed through the 80387 control word, that allows all 80387 arithmetic to be performed with reduced precision. Because no speed advantage results from this option, its only use is for strict compatibility with the standard and with other computer systems. Precision Exception an 80387 exception condition that results when a calculation does not return an exact answer. This exception is usually masked and ignored; it is used only in extremely critical applications, when the user must know if the results are exact. The precision exception is called inexact in the standard. Pseudozero one of a set of special values of the extended real format. The set consists of numbers with a zero significand and an exponent that is neither all zeros nor all ones. Pseudozeros are not created by the 80387 but are handled correctly when encountered as operands. Quiet NaN a NaN in which the most significant bit of the fractional part of the significand is one. By convention, these NaNs can undergo certain operations without causing anexception. Real any finite value (negative, positive, or zero) that can be represented by a (possibly infinite) decimal expansion. Reals can be represented as the points of a line marked off like a ruler. The term real can also refer to a floating-point number that represents a real value. Short Integer an integer format supported by the 80387 that consists of a 32-bit two's complement quantity. short integer is not the shortest 80387 integer format‘‘the 16-bit word integer is. Short Real an older term for the 80387's 32-bit single format. Signaling NaN a NaN that causes an invalid-operation exception whenever it enters into a calculation or comparison, even a nonordered comparison. Significand the part of a floating-point number that consists of the most significant nonzero bits of the number, if the number were written out in an unlimited binary format. The significand is composed of an integer bit and a fraction. The integer bit is implicit in the single format and double format. The significand is considered to have a binary point after the integer bit; the binary point is then moved according to the value of the exponent. Single Extended a floating-point format, required by the standard, that provides greater precision than single; it also provides an explicit integer bit in the significand. The 80387's extended format meets the single extended requirement as well as the double extended requirement. Single Format a floating-point format supported by the 80387, which consists of a sign, an 8-bit biased exponent, an implicit integer bit, and a 23-bit significand‘‘a total of 32 explicit bits. Stack Fault a special case of the invalid-operation exception which is indicated by a one in the SF bit of the status word. This condition usually results from stack underflow or overflow. Standard "IEEE Standard for Binary Floating-Point Arithmetic," ANSI/IEEE Std 754-1985. Status Word A 16-bit 80387 register that can be manually set, but which is usually controlled by side effects to 80387 instructions. It contains condition codes, the 80387 stack pointer, busy and interrupt bits, and exception flags. Tag Word a 16-bit 80387 register that is automatically maintained by the 80387. For each space in the 80387 stack, it tells if the space is occupied by a number; if so, it gives information about what kind of number. Temporary Real an older term for the 80387's 80-bit extended format. Tiny of or pertaining to a floating-point number that is so close to zero that its exponent is smaller than smallest exponent that can be represented in the destination format. TOP The three-bit field of the status word that indicates which 80387 register is the current top of stack. Transcendental one of a class of functions for which polynomial formulas are always approximate, never exact for more than isolated values. The 80387 supports trigonometric, exponential, and logarithmic functions; all are transcendental. Two's Complement a method of representing integers. If the uppermost bit is zero, the number is considered positive, with the value given by the rest of the bits. If the uppermost bit is one, the number is negative, with the value obtained by subtracting (2^(bit count)) from all the given bits. For example, the 8-bit number 11111100 is -4, obtained by subtracting 2^(8) from 252. Unbiased Exponent the true value that tells how far and in which direction to move the binary point of the significand of a floating-point number. For example, if a single-format exponent is 131, we subtract the Bias 127 to obtain the unbiased exponent +4. Thus, the real number being represented is the significand with the binary point shifted 4 bits to the right. Underflow an exception condition in which the correct answer is nonzero, but has a magnitude too small to be represented as a normal number in the destination floating-point format. The Standard specifies that an attempt be made to represent the number as a denormal. This denormalization may result in a loss of significant bits from the significand. This kind of underflow (also called numeric overflow) is not to be confused with stack underflow. Unmasked a term that applies to each of the six 80387 exceptions: I,D,Z,O,U,P. An exception is unmasked if a corresponding bit in the 80387 control word is set to zero. If an exception is unmasked, the 80387 will generate an interrupt when the exception condition occurs. You can provide an interrupt routine that customizes your exception recovery. Unnormal a extended real representation in which the explicit integer bit of the significand is zero and the exponent is nonzero. Unnormal values are not supported by the 80387; they cause the invalid-operation exception when encountered as operands. Unsupported Format Any number representation that is not recognized by the 80387. This includes several formats that are recognized by the 8087 and 80287; namely: pseudo-NaN, pseudoinfinity, and unnormal. Word Integer an integer format supported by both the 80386 and the 80387 that consists of a 16-bit two's complement quantity. Zero divide an exception condition in which the inputs are finite, but the correct answer, even with an unlimited exponent, has infinite magnitude.