Speed

Class # Slides and Notes Preparation for Class References
7
1/29/2025
Speed – Thinking About Optimization
8
2/3/2025
Speed – Profiling 

Class Discussion Notes:

Feb 3 – Histogrammer Optimizations

  • Speed Analysis – 3 (1:01:08)
    • Intro
    • Optimization Books (3:27)
    • Finding the Right Code to Optimize (8:44)
    • Sampling a Program (9:55)
    • Toolchain Data Flow for Profiling Support (31:33)
    • Retrieving the Return Address and Finding the Function (37:35)
    • Profiler API and Use (44:48)
    • Demonstration (46:46)
    • Examining RegionCount Table with Debugger (53:32)
    • Examining and Validating Sorted Region Count Info on LCD (57:11)
    • What do the numbers tell us to do? (1:00:18)
  • Speed Analysis 4 – Profiler and Lost Addresses (~5:09)
  • GetRegions is available on Github at Tools/GetRegions. Please read the manual in that directory (Profiling Tools.pdf) for instructions on installation, use and troubleshooting.
  • Github: Speed/Histogrammer has source code for in-class discussion
Speed – Optimization Tour with Spherical Geometry 

Class Discussion Notes:
Feb 5 – Selecting Optimization Ideas

  • Speed Optimization 1 – Low-Level Optimizations (52:30)
    • Intro and Fundamentals,
    • Compiler Optimization Settings (9:14),
    • Review of Example Program for Optimization (13:01),
      • Version 1 (Base) Results (17:25),
      • Reference: Website with Bearing Calculation (19:00),
      • Version 1 Profile (20:12),
      • What Should We Do About __aeabi_dmul? (23:24),
      • Slides on Automatic Type Promotion and Floating-Point Math (30:24),
      • Rebuild without –fpmode=fast (32:12),
      • Who Is Calling the Double-Precision Floating-Point Math Functions? (34:44),
      • We’re Asking the Compiler to Optimize, Right? (35:32),
    • Version 2: Single-Precision Floating-Point Math (38:10),
      • Version 2 Results (39:10),
      • Version 2 Object Code Examination (Operator Precedence and Order of Evaluation) (42:00),
    • Version 3: Parenthesize PI/180 Operations (49:23),
      • Version 3 Results (49:59)
9
2/5/2025
  •  Speed Optimization 2 – Low-Level Optimizations (1:10:36)
    • Fixing the Bug in the Bearing Code (0:00),
    • Recap (1:22),
    • Version 4: PI_OVER_180 for Less Floating-Point Multiplication (7:18),
      • Version 4 Results (8:12),
      • Re-Check Version 3 Results (9:01),
    • Version 5: Local copies of lat and lon (10:23),
      • Version 5 Results (16:18),
    • Version 6: Forced Common Subexpression Elimination (20:21),
      • Version 6 Results (26:18),
      • Version 6: Analyzing the Object Code (28:47),
    • Version 7: Forcing Compiler to Reuse Results of Cosine Call (34:46),
      • Version 7 Results (39:30),
    • Version 8: Force Compiler to Reuse cos(p2lat) (40:32),
      • Version 8 Results (42:57),
    • Version 9: Merge Calc_Distance and Calc_Bearing Functions (46:00),
      • Version 9 Results (47:47),
    • Version 10: Force Some Common Sub-Expression Elimination (49:01),
      • Version 10 Results (51:39),
      • Version 10 Analysis (54:13),
    • Version 11 (in SG2 Project): Change Table to Reduce Run-Time Calculations (55:15),
      • Version 11 Results (1:01:33),
    • Version 12: Less is More (1:03:34),
      • Version 12 Results (1:07:58),
    • Version 13: Less is More, Part 2 (1:08:57),
      • Version 13 Results (1:09:23)
  • Github
    • Speed/SpeedDemo-SG
    • Speed/SpeedDemo-SG2
 10
2/10/2025
Speed – Shield Optimizations
Speed – Numerical Approximations
  • Speed Optimization 4a – Numerical Approximation Concepts (28:48)
    • Intro
    • Look-Up Tables
      • Reducing Look-Up Table Size by Using Interpolation
      • A One-Element Look-Up Table
    • Polynomial Approximations
      • Determining Coefficients
      • Improving Accuracy without Adding Terms
      • Approximating Periodic Functions
      • Approximating Symmetric Functions
 11
2/12/2025
Speed – Native Integer and Fixed-Point Math
  • Speed Optimization 5 – Native Integer and Fixed-Point Math (1:05:56)
    • Native Integer Math,
    • Fixed Point Math (5:44),
      • Representations (6:18),
      • Support Operations (13:53),
      • Mathematical Operations:
        • Adddition/Subtraction (15:45),
        • Multiplication (18:02),
        • Division (20:18),
      • More Examples and Simple Example Code (28:20),
      • Question on Division (31:14).
      • Comments on Accuracy of Division in Simple Sample Code (32:53),
    • Fixed-Point Update_PID Function(33:28)
      • Closed-Loop Control System Overview (33:28),
      • Initial Floating-Point PID Controller Implementation(36:15),
      • Fixed-Point PID Controller Overview (38:39),
      • Implementation: Types, Conversions, Addition and Subtraction (41:19),
      • Implementation: Multiplication (44:18),
      • Fixed-Point Multiplication
        Performance Quirks (50:57),
      • UpdatePID_FX Timing Analysis (55:53),
    • Cortex-M0+ and CMSIS-DSP Support for Fixed-Point Math (58:05)
 12
2/17/2025
Speed – Better Data Organization

 

Speed – M0+ vs. M4F

  • Repo Code
    • SpeedDemoM4F-SG
    • SpeedDemoM4F-SG
13
2/19/2025
Speed – SIMD and DSP

 

Speed – Shield Optimizations