Cambridge International Science Publishing

WRITING FAST PROGRAMS: A PRACTICAL GUIDE FOR SCIENTISTS AND ENGINEERS

John Riley
DSB Scientific Consulting
616 Finch Ct. Lugoff, SC 29078-8877, USA


ORDER HERE

ISBN 1-904602-40-1

340 pages

Softback
February 2006

£32/$50
plus p&p

Order this book

CISP Homepage
Tel:+44 (0)1223 893295
Fax: +44 (0)1223 894539
E-mail

"Writing Fast Programs" provides the basic elements of code optimization and provides strategies for reducing bottlenecks in practical simulation and numerical modeling code. The target audience is scientists and engineers and students in these fields. One pre-publication reviewer called this a much-needed intermediate text to bridge the gap between existing introductory and more advance programming books aimed at scientists. "Writing Fast Programs" does not teach basic programming; some programming proficiency is assumed, along with familiarity with the basic programming terminology. Code examples are presented in C, but BASIC (as a convenient pseudo-language) examples are provided for those not familiar with C. In general, the strategies presented are not language specific and should therefore benefit a wide programming audience.
More information available from the DSB Scientific website, please click here.
Contents
Preface
Forward
Acknowledgements
List of Figures
List of Tables
Part I:The Foundations
1. Introduction to Code Optimization
1.1 Motivation for Writing High Performance Code
1.2 Scientist Programmer vs. Computer Programmer
1.3 Programming Languages
1.3.1 One Extreme: BASIC for Ease of Use
1.3.2 Another Extreme: C For Speed
1.3.3 Program Development Efficiency
1.3.4 Scripting
1.3.5 Procedural vs. Object Oriented Languages
1.4 General Thoughts on Optimization
1.4.1 Why Optimize
1.4.2 How Difficult is Optimization
1.4.3 When and What to Optimize
1.5 Overview of Samples Presented in This Book
1.6 Additional Reading
2. PC Hardware
2.1 Basic System Architecture
2.2 Numerical Representations
2.3 Addressing
2.4 Basic CPU Architecture and Introduction to ASSEMBLY Language
2.5 Code Execution and Timing
2.6 Pipelining, Speculative Execution and Branch Prediction
2.7 Optimization at the CPU Level
2.8 A Brief Historical Summary of Desktop Computer CPUs
2.8.1 Intel Processors
2.8.2 AMD Processors
2.8.3 Cyrix Processors
2.8.4 Motorola/IBM Processors
2.9 Modern Physical Memory Architectures
2.9.1 Basic Memory Architecture
2.9.2 SDRAM and DDR SDRAM Memory
2.9.3 RAMBUS Memory
3. Operating System Considerations
3.1 The Operating System in Perspective
3.2 Operating Systems and Performance
3.3 Operating System Architectures
3.4 Specific PC Operating Systems
3.4.1 MS Windows
3.4.2 Linux
3.5 A Brief Comparison of Windows and Linux Performance
3.5.1 Simple Numerical Procedure
3.5.2 Thread/Process Creation
3.5.3 Context Switching
4. Compiler Considerations
4.1 Interpreters vs. Compilers
4.2 Compiler Optimizations
4.2.1 Aliasing
4.2.2 Array Bounds Checking
4.2.3 Numerical Overflow Checking
4.2.4 Unrounded Floating Point Operations
4.2.5 FDIV Bug Checking
4.2.6 Inline and Intrinsic Functions
4.2.7 Common Sub Expressions
4.3 Using Programs Compiled with Old Compilers
4.4 C++ and Similar Compilers
4.4.1 MS Visual C++
4.4.2 MS Visual C#
4.4.3 g++
4.4.4 VectorC
4.4.5 KAI C++ Part II: Implementation
5. Data Management
5.1 Implicit Declaration and the Variant Problem
5.2 Eliminating Implicit Type Conversions
5.2.1 Type Matching to Function Calls
5.2.2 Loop Counters
5.3 Immediates, Constants and Variables
5.4 Type Specific Operators
5.5 Variable Scope
5.6 Data Organization
5.6.1 Scalars, Arrays and Hashes
5.6.2 Pointers
5.6.3 Queues and Stacks
5.6.4 Linked Lists and Trees
5.6.5 Structures/User Defined Types
5.6.6 Objects and Classes
5.7 Naming Variables, Functions, Objects and Classes
6. Function and Procedure Calling: Optimizing Program Flow
6.1 Mechanism of Function Calling and Inline Code
6.2 Programming in the Sub-Routine Style
6.3 Calling with Reduced Stack Overhead
6.4 Using Library Functions
6.5 Hand Coding vs. Using Language Functions: A Pseudo Random Number Generator
6.6 Recursive Functions: The Factorial
6.7 Function Calling Conventions
7. Loops and Vectors
7.1 General Vector Concepts and the Potential for Complex Code Structures
7.2 Loop Unrolling in a Trade-Off with Generality Matrix Multiplication
7.3 Partial Loop Unrolling Simpson’s Rule Integration
7.4 A Practical Example: Time Dependent Heat Flow
8. Programming in the RISC Style
8.1 The KISS Principle
8.2 Practical Example The Lennard Jones Energy
8.3 Practical Example The Boltzmann Factor
8.4 Other Mathematical Tricks
9. Look-Up Tables
9.1 Memory Access vs. Computation
9.2 The LUT in Action: 1 Dimensional Examples
9.2.1 Lennard Jones Energy
9.2.2 Other 1-D Examples
9.2.3 Introduced Error in the LUT Technique
9.3 The LUT in Action: 2 Dimensional Examples
9.3.1 Distance
9.3.2 Heterogeneous Lennard Jones Energy
9.4 Virtual vs. Physical Memory
10. Other Algorithm Optimization Techniques
10.1 The Use of Symmetry
10.2 Elimination of Nested Loops
10.3 Reducing Decision Logic Overhead: Bit Flag Encoding
Part III: Parallel Processing
11. Multi-Tasking Basics
11.1 Multi-Tasking Terms
11.2 Multi-Threaded Programming
11.2.1 Simple Thread Creation and Termination
11.2.2 Communicating between Threads
11.2.3 When to Use Multiple Threads
11.3 Multi-Process Programming
11.3.1 Simple Process Creation and Termination
11.3.2 Communicating Between Processes
11.3.3 When to Use Multiple Processes
12. Parallel Computation Basics
12.1 Parallel Architecture Basics
12.1.1 Multiple Execution Units in the CPU
12.1.2 Symmetric Multi-Processors
12.1.3 Single Instruction, Multiple Data (SIMD)
12.1.4 Multiple Instruction, Multiple Data (MIMD)
12.1.5 Algorithm Considerations
12.1.6 Performance Considerations in Parallel Systems
12.2 SIMD
12.2.1 Integer Array Addition with MMX
12.2.2 Floating Point SIMD and Testing for SIMD Capability
12.2.3 Vector Dot Product using SIMD
12.2.4 4x4 Matrix Multiplication using SIMD
12.2.5 Simpson’s Rule Integration using SIMD
12.2.6 A Comparison of 3dNow! to SSE SIMD Performance
12.2.7 Compilers for SIMD and Additional SIMD Extensions
12.3 MIMD
12.3.1 Networks of Workstations
12.3.2 Clusters
12.3.3 Distributed Computing
12.4 Implementing MIMD
12.4.1 Simple Parallel Computing Without Messages
12.4.2 Simple Master-Slave Parallel Computing
12.4.3 Simple Message Passing Parallel Computing
12.5 Message Passing Tools
12.5.1 Parallel Virtual Machine (PVM)
12.5.2 Message Passing Interface (MPI)
12.5.3 Other Tools
Appendix A: A List of Modern Programming Tools
Appendix B: How to Buy A High Performance PC
Appendix C: Contents of the CD-Rom
Appendix D: Complete Listing of MCGas Monte Carlo Program (Demo)
Appendix E: BASIC Listings for the Part I and Part II Demo Programs Bibliography Index