Class # |
Slides and Notes |
Preparation for Class |
References |
7
1/29/2025 |
Speed – Thinking About Optimization |
|
|
8
2/3/2025 |
Speed – Profiling
Class Discussion Notes:
Feb 3 – Histogrammer Optimizations |
- Speed Analysis – 3 (1:01:08)
- Intro
- Optimization Books (3:27)
- Finding the Right Code to Optimize (8:44)
- Sampling a Program (9:55)
- Toolchain Data Flow for Profiling Support (31:33)
- Retrieving the Return Address and Finding the Function (37:35)
- Profiler API and Use (44:48)
- Demonstration (46:46)
- Examining RegionCount Table with Debugger (53:32)
- Examining and Validating Sorted Region Count Info on LCD (57:11)
- What do the numbers tell us to do? (1:00:18)
- Speed Analysis 4 – Profiler and Lost Addresses (~5:09)
|
- GetRegions is available on Github at Tools/GetRegions. Please read the manual in that directory (Profiling Tools.pdf) for instructions on installation, use and troubleshooting.

- Github: Speed/Histogrammer has source code for in-class discussion
|
Speed – Optimization Tour with Spherical Geometry
Class Discussion Notes:
Feb 5 – Selecting Optimization Ideas |
- Speed Optimization 1 – Low-Level Optimizations (52:30)
- Intro and Fundamentals,
- Compiler Optimization Settings (9:14),
- Review of Example Program for Optimization (13:01),
- Version 1 (Base) Results (17:25),
- Reference: Website with Bearing Calculation (19:00),
- Version 1 Profile (20:12),
- What Should We Do About __aeabi_dmul? (23:24),
- Slides on Automatic Type Promotion and Floating-Point Math (30:24),
- Rebuild without –fpmode=fast (32:12),
- Who Is Calling the Double-Precision Floating-Point Math Functions? (34:44),
- We’re Asking the Compiler to Optimize, Right? (35:32),
- Version 2: Single-Precision Floating-Point Math (38:10),
- Version 2 Results (39:10),
- Version 2 Object Code Examination (Operator Precedence and Order of Evaluation) (42:00),
- Version 3: Parenthesize PI/180 Operations (49:23),
- Version 3 Results (49:59)
|
|
9
2/5/2025 |
- Speed Optimization 2 – Low-Level Optimizations (1:10:36)
- Fixing the Bug in the Bearing Code (0:00),
- Recap (1:22),
- Version 4: PI_OVER_180 for Less Floating-Point Multiplication (7:18),
- Version 4 Results (8:12),
- Re-Check Version 3 Results (9:01),
- Version 5: Local copies of lat and lon (10:23),
- Version 5 Results (16:18),
- Version 6: Forced Common Subexpression Elimination (20:21),
- Version 6 Results (26:18),
- Version 6: Analyzing the Object Code (28:47),
- Version 7: Forcing Compiler to Reuse Results of Cosine Call (34:46),
- Version 7 Results (39:30),
- Version 8: Force Compiler to Reuse cos(p2lat) (40:32),
- Version 8 Results (42:57),
- Version 9: Merge Calc_Distance and Calc_Bearing Functions (46:00),
- Version 9 Results (47:47),
- Version 10: Force Some Common Sub-Expression Elimination (49:01),
- Version 10 Results (51:39),
- Version 10 Analysis (54:13),
- Version 11 (in SG2 Project): Change Table to Reduce Run-Time Calculations (55:15),
- Version 11 Results (1:01:33),
- Version 12: Less is More (1:03:34),
- Version 12 Results (1:07:58),
- Version 13: Less is More, Part 2 (1:08:57),
- Version 13 Results (1:09:23)
|
- Github
- Speed/SpeedDemo-SG
- Speed/SpeedDemo-SG2
|
10
2/10/2025 |
Speed – Shield Optimizations |
|
|
Speed – Numerical Approximations |
- Speed Optimization 4a – Numerical Approximation Concepts (28:48)
- Intro
- Look-Up Tables
- Reducing Look-Up Table Size by Using Interpolation
- A One-Element Look-Up Table
- Polynomial Approximations
- Determining Coefficients
- Improving Accuracy without Adding Terms
- Approximating Periodic Functions
- Approximating Symmetric Functions
|
|
|
11
2/12/2025 |
Speed – Native Integer and Fixed-Point Math |
- Speed Optimization 5 – Native Integer and Fixed-Point Math (1:05:56)
- Native Integer Math,
- Fixed Point Math (5:44),
- Representations (6:18),
- Support Operations (13:53),
- Mathematical Operations:
- Adddition/Subtraction (15:45),
- Multiplication (18:02),
- Division (20:18),
- More Examples and Simple Example Code (28:20),
- Question on Division (31:14).
- Comments on Accuracy of Division in Simple Sample Code (32:53),
- Fixed-Point Update_PID Function(33:28)
- Closed-Loop Control System Overview (33:28),
- Initial Floating-Point PID Controller Implementation(36:15),
- Fixed-Point PID Controller Overview (38:39),
- Implementation: Types, Conversions, Addition and Subtraction (41:19),
- Implementation: Multiplication (44:18),
- Fixed-Point Multiplication
Performance Quirks (50:57),
- UpdatePID_FX Timing Analysis (55:53),
- Cortex-M0+ and CMSIS-DSP Support for Fixed-Point Math (58:05)
|
- Github: Speed/HBLED_FXP_Controller
- Integer and Fixed Point Math
- WinMerge for comparing folders and files
|
12
2/17/2025 |
Speed – Better Data Organization
Speed – M0+ vs. M4F |
|
- Repo Code
- SpeedDemoM4F-SG
- SpeedDemoM4F-SG
|
13
2/19/2025 |
Speed – SIMD and DSP
Speed – Shield Optimizations |
|
|