# **Shield Speed Optimizations**

I

### Overview

- LCD controller interface
- LCD bitmapped text
- Using SIMD for pixel data formatting

2

## LCD CONTROLLER INTERFACE

3

## LCD Controller IC – ST7789S

## Sitronix

### ST7789S

240RGB x 320 dot 262K Color with Frame Memory Single-Chip TFT Controller/Driver

#### **Datasheet**

Version 1.5 2013/04

**Sitronix Technology Corporation** 

Sitronix Technology Corp. reserves the right to change the contents in this document without prior notice.





### LCD\_Plot\_Pixel

- Set-up operations
  - Set column address: 5 writes
  - Set page (row) address: 5 writes
- Data write operation
  - Write to memory command: I write
  - MSB, LSB: 2 bytes, 2 writes
- Total LCD operations: 13/pixel
- Full screen update
  - 240 x 320 pixels \* 13 write operations/pixel
     \mathbb{M} write operations



```
/* Set the pixel at pos to the given color. */
void LCD Plot Pixel(PT T * pos, COLOR T * color) {
  uint8 t b1, b2;
  // Column address set 0x2a
 LCD 24S Write Command(0x002A); //column address set
  LCD 24S Write Data(0);
LCD 24S Write Data(pos->X & 0xff); //start
  LCD 24S Write Data(0x0000);
  LCD 24S Write Data(0x00EF); //end 0x00EF
  // Page (row) address set 0x2b
  LCD 24S Write Command(0x002B); //page address set
  LCD 24S Write Data(pos->Y >> 8);
  LCD 24S Write Data(pos->Y & 0xff); //start
  LCD 24S Write Data (0x0001);
  LCD 24S Write Data(0x003F); //end 0x013F
  // Memory Write 0x2c
  // 16 bpp, 5-6-5. Assume color channel data is left-aligned
  b1 = (color -> R&0xf8) \mid ((color -> G&0xe0) >> 5);
  b2 = ((color -> G&0x1c) << 3) | ((color -> B&0xf8) >> 3);
  LCD 24S Write Command(0x002c);
  LCD 24S Write Data(b1);
  LCD 24S Write Data(b2);
```

### **Commands**

9.1.22 RAMWR (2Ch): Memory Write

| 2CH                       | RAMWR (Memory Write) |                                    |         |                                                                      |          |          |         |          |       |       |       |         |       |
|---------------------------|----------------------|------------------------------------|---------|----------------------------------------------------------------------|----------|----------|---------|----------|-------|-------|-------|---------|-------|
| Inst / Para               | D/CX                 | WRX                                | RDX     | D17-8                                                                | D7       | D6       | D5      | D4       | D3    | D2    | D1    | D0      | HEX   |
| RAMWR                     | 0                    | 1                                  | 1       | . <del></del>                                                        | 0        | 0        | 1       | 0        | 1     | 1     | 0     | 0       | (2Ch) |
| 1 <sup>st</sup> parameter | 1                    | 1                                  | 1       | D1[17]-1[8]                                                          | D1[7]    | D1[6]    | D1[5]   | D1[4]    | D1[3] | D1[2] | D1[1] | D1[0]   |       |
| 3111                      | 1                    | 1                                  | 1       | Dx[17]-x[8]                                                          | Dx[7]    | Dx[6]    | Dx[5]   | Dx[4]    | Dx[3] | Dx[2] | Dx[1] | Dx[0]   |       |
| N parameter               | 1                    | 1                                  | 1       | Dn[17]-n[8]                                                          | Dn[7]    | Dn[6]    | Dn[5]   | Dn[4]    | Dn[3] | Dn[2] | Dn[1] | Dn[0]   |       |
| Description               | -When page p         | this com<br>ositions.<br>art colur | nmand i | to transfer da<br>s accepted, the<br>page positions<br>mmand can sto | e column | register | and the | page reg |       |       |       | column/ | start |

7

## Defining Rectangle Start and End Addresses

### 9.1.20 CASET (2Ah): Column Address Set

| 2AH                                                                                                                                     |      | CASET (Column Address Set) |     |       |      |      |      |      |      |      | 88  |     |       |
|-----------------------------------------------------------------------------------------------------------------------------------------|------|----------------------------|-----|-------|------|------|------|------|------|------|-----|-----|-------|
| Inst / Para                                                                                                                             | D/CX | WRX                        | RDX | D17-8 | D7   | D6   | D5   | D4   | D3   | D2   | D1  | D0  | HEX   |
| CASET                                                                                                                                   | 0    | 1                          | 1   | -     | 0    | 0    | 1    | 0    | 1    | 0    | 1   | 0   | (2Ah) |
| 1 <sup>st</sup> parameter                                                                                                               | 1    | 1                          | 1   | -     | XS15 | XS14 | XS13 | XS12 | XS11 | XS10 | XS9 | XS8 |       |
| 2 <sup>nd</sup> parameter                                                                                                               | 1    | 1                          | 1   | -     | XS7  | XS6  | XS5  | XS4  | XS3  | XS2  | XS1 | XS0 |       |
| 3 <sup>rd</sup> parameter                                                                                                               | 1    | 1                          | 1   | -     | XE15 | XE14 | XE13 | XE12 | XE11 | XE10 | XE9 | XE8 |       |
| 4 <sup>th</sup> parameter                                                                                                               | 1    | 1                          | 1   | ¥     | XE7  | XE6  | XE5  | XE4  | XE3  | XE2  | XE1 | XE0 |       |
| -The value of XS [7:0] and XE [7:0] are referred when RAMWR command comes.  -Each value represents one column line in the Frame Memory. |      |                            |     |       |      |      |      |      |      |      |     |     |       |

### 9.1.21 RASET (2Bh): Row Address Set

| 2BH                       |          | RASET (Row Address Set) |          |                                   |            |           |         |        |      |      |     |     |      |
|---------------------------|----------|-------------------------|----------|-----------------------------------|------------|-----------|---------|--------|------|------|-----|-----|------|
| Inst / Para               | D/CX     | WRX                     | RDX      | D17-8                             | D7         | D6        | D5      | D4     | D3   | D2   | D1  | D0  | HEX  |
| RASET                     | 0        | 1                       | 1        | -                                 | 0          | 0         | 1       | 0      | 1    | 0    | 1   | 1   | (2Bh |
| 1 <sup>st</sup> parameter | 1        | 1                       | 1        | 100                               | YS15       | YS14      | YS13    | YS12   | YS11 | YS10 | YS9 | YS8 |      |
| 2 <sup>nd</sup> parameter | 1        | 1                       | 1        |                                   | YS7        | YS6       | YS5     | YS4    | YS3  | YS2  | YS1 | YS0 |      |
| 3 <sup>rd</sup> parameter | 1        | 1                       | 1        | -                                 | YE15       | YE14      | YE13    | YE12   | YE11 | YE10 | YE9 | YE8 |      |
| 4 <sup>th</sup> parameter | 1        | 1                       | 1        | 18                                | YE7        | YE6       | YE5     | YE4    | YE3  | YE2  | YE1 | YE0 |      |
|                           | -The val | ue of YS                | [15:0] a | defined<br>nd YE [15<br>ne page I | 5:0] are r | eferred v | hen RAI | MWR co |      |      |     |     |      |



## **Drawing Rectangles**

- Controller can accept multiple data values
  - Will store data in consecutive locations (increasing addresses)
  - Will wrap address based on XS and XE, YS and YE

9.1.22 RAMWR (2Ch): Memory Write

| 2CH                       | RAMWR (Memory Write) |                                    |          |                                                       |          |          |         |          |       |       |           |         |       |
|---------------------------|----------------------|------------------------------------|----------|-------------------------------------------------------|----------|----------|---------|----------|-------|-------|-----------|---------|-------|
| Inst / Para               | D/CX                 | WRX                                | RDX      | D17-8                                                 | D7       | D6       | D5      | D4       | D3    | D2    | D1        | D0      | HEX   |
| RAMWR                     | 0                    | 1                                  | 1        | 3 <b>5</b> 3                                          | 0        | 0        | 1       | 0        | 1     | 1     | 0         | 0       | (2Ch) |
| 1 <sup>st</sup> parameter | 1                    | 1                                  | 1        | D1[17]-1[8]                                           | D1[7]    | D1[6]    | D1[5]   | D1[4]    | D1[3] | D1[2] | D1[1]     | D1[0]   |       |
| 15,5001                   | 1                    | 1                                  | 1        | Dx[17]-x[8]                                           | Dx[7]    | Dx[6]    | Dx[5]   | Dx[4]    | Dx[3] | Dx[2] | Dx[1]     | Dx[0]   |       |
| N parameter               | 1                    | 1                                  | 1        | Dn[17]-n[8]                                           | Dn[7]    | Dn[6]    | Dn[5]   | Dn[4]    | Dn[3] | Dn[2] | Dn[1]     | Dn[0]   |       |
| Description               | -When<br>page p      | this com<br>ositions.<br>art colur | nmand is | d to transfer da<br>s accepted, the<br>page positions | e column | register | and the | page reg |       |       | the start | column/ | start |

9

## **Drawing Rectangles**

- Set-up operations
  - Set column address: 5 writes
  - Set page (row) address: 5 writes
  - Start the 0x2C write command, but don't send data yet: I write
- Data write operation
  - MSB, LSB: 2 writes
- Total LCD Operations: I I + 2/pixel
  - Compare with 13 LCD operations/pixel in LCD Plot Pixel

```
uint32 t LCD Start Rectangle (PT T * p1, PT T * p2) {
  uint32 t n;
  uint16 t c min, c max, r min, r max;
  // Find bounds of rectangle
  c min = MIN(p1->X, p2->X);
  c max = MAX(p1->X, p2->X);
  r min = MIN(p1->Y, p2->Y);
  r max = MAX(p1->Y, p2->Y);
  // Clip to display size
  c max = MIN(c max, LCD WIDTH-1);
  r max = MIN(r max, LCD HEIGHT-1);
 n = (c max - c min + ) * (r max - r min +
    // Enable access to full screen, reset write pointer to origin
    LCD 24S Write Command(0x002A); //column address set
    LCD 24S Write Data (c min >> 8);
    LCD 24S Write Data(c min & 0xff); //start
    LCD 24S Write Data (c max >> 8);
    LCD 24S Write Data c max & 0xff); //end
    LCD 24S Write Command (0x002B); //page address set
    LCD 24S Write Data (r min >> 8);
    LCD 24S Write Data(r min & Oxff); //start
    LCD 24S Write Data (r max >> 8);
    LCD 24S Write Data(r max & 0xff); //end
    // Memory Write 0x2c
    LCD 24S Write Command(0x002c)
  return n;
```

# **LCD BITMAPPED TEXT**

### Providing Text for LCD





- LCD controller does not have built-in character bitmaps
- Instead, user code has to render (draw) text
- Two options
  - Bitmap: Set pixels based on bitmap fast (but ugly if scaled much)
  - Vector: Draw a series of lines slow (but beautiful with scaling)
- Will use bitmap (for speed)
  - Use free tool (GLCD Font Creator) to generate bitmaps from Windows fonts: <a href="https://www.mikroe.com/glcd-font-creator">https://www.mikroe.com/glcd-font-creator</a>
  - Bitmap uses I for character foreground, 0 for background
  - Code needs to read bitmap and output foreground or background color for each pixel in bitmap

### Simple Text Rendering Code



- Loop iterates over each column in bitmap
- Calls LCD\_Plot\_Pixel per pixel
- Slow because of set-up overhead before each pixel of data

```
pixel pos.Y = pos->Y;
for (row = 0; row < CHAR HEIGHT; row++) {
    pixel pos.X = pos->X;
    x bm = 0;
    do {
        bitmap_byte = *glyph_data;
        for (col = 0; col < 8; col++) {
            if (bitmap byte & 0x01) // if pixel is to be set
                pixel color = fg color;
            else
                pixel color = bg color;
            LCD Plot Pixel (&pixel pos, pixel color);
            bitmap byte >>= 1;
            pixel pos.X++;
            x bm++;
        glyph data++;
    } while (x bm < width);
    pixel pos.Y++;
```

## LCD\_PrintChar Optimized for Pixel Runs

- Draws a rectangle for each run of pixels
- Identifies runs of pixels by examining existing bitmap data
  - If 0000 0000, then draw run of 8 background pixels
  - If IIII IIII, then draw run of 8 foreground pixels
  - If x000 0000, then draw run of 7 background pixels
  - Et cetera
  - Draw any remaining pixels in byte individualy
- Could improve performance further by changing bitmap data to encode run information

```
// Special cases with run starting at LSDackground Color
// Up to 8 bit run
if (bitmap byte == 0.000)
    num pixels = MIN(8, glyph width - x bm);
    LCD_Write_Rectangle_Pixel(&bg, num_pixels); (oregrow) (dor
} else if (bitmap byte == Oxff) {
    num pixels = MIN(8,qlyph width - x bm);
    LCD Write Rectangle Pixel (&fg, num pixels);
    x bm += num pixels;
} else {
    col = 0;
    num pixels = 0;
    if ((bitmap byte & 0x7f) == 0) {
                                              // Up to 7 bit run
        num pixels = MIN(7, glyph width - x bm);
        LCD Write Rectangle Pixel (&bg, num pixels);
    } else if ((bitmap byte & 0x7f) == 0x7f) {
        num pixels = MIN(7, glyph width - x bm);
        LCD Write Rectangle Pixel (&fg, num pixels);
    } else if ((bitmap byte & 0x3f) == 0) { // Up to 6 bit run
```

# USING SIMD FOR PIXEL DATA REFORMATTING

## We're Using Only Part of 32-bit Data Path in CPU



### Four-Wide SIMD?

- SIMD: Single Instruction, Multiple Data
  - Have each register hold multiple data elements (mini-vector)
  - Now one instruction can process multiple data elements
- Want to process four bytes in parallel in a 32-bit register





### Must Reorganize Data and Interface

- Original code
  - Pixel color is in a structure with color components
    - struct { uint8\_t R, G, B; } COLOR\_T;
  - Image is an array of structures
    - COLOR\_T image[W][H]
  - One component of one pixel is loaded into each register
  - Loading register with full word from memory? Will just get R, G, B components of single pixel: not useful
- Changes needed
  - Pass at least four pixels of data to function at a time
  - Reorganize data in memory so loading a register with a word will get a component (e.g. R) from four adjacent pixels

### New Data Organization and Interface

- Pass at least four pixels of data to function at a time
- Reorganize data in memory
  - Loading a register (LDR) should get a component (e.g. R) from four adjacent pixels
  - Reorganize data into structure of arrays

```
struct {
  uint8_t R[W*H],
    G[W*H],
    B[W*H];
}
```



### Four-Wide SIMD

- Load three color components into three registers
  - R: Four reds, G: four greens, B: four blues
- Mask off and shift color component bits of interest
  - Four reds, four high greens, four low greens, four blues
- Merge to create W1 (set of four first bytes)
- Merge to create W2 (set of four second bytes)
- Send out four pairs of bytes sequentially
  - Extract b1 and b2 from W1 and W2
  - Write the data
  - Shift W1 and W2 to prep for next pair of bytes

```
void LCD Write Rectangle N Quad Pixel Components (
       uint32 t * aR, uint32 t * aG,
       uint32 t * aB, int32 t n) {
       uint8 t b1, b2; uint8 t i;
       uint32 t R, G, B, GH, GL, W1, W2;
       do {
              R = *aR++
               G = *aG++;
               B = *aB++:
               R &= 0xf8f8f8f8;
               GH = (G&0xe0e0e0e0) >> 5;
               GL = (G&0x1c1c1c1c) >> 2;
               B = (B\&0xf8f8f8f8) >> 3;
               W1 = R \mid GH;
               W2 = GL \mid B;
               for (i=0; i<4; i++) {
                      b1 = W1 & 0x0000000ff;
                      b2 = W2 & 0 \times 00000000
                      LCD 24S Write Data(b1);
                      LCD 24S Write Data(b2);
                      W1 >>= 8;
                      W2 >>= 8;
               }
```

}

# **APPENDIX**

### Commands

Software Reset Read Display ID Read Display Status Read Display Power Mode Read Display MADCTL Read Display Pixel Format Read Display Image Mode Read Display Signal Mode Read Display Self-Diagnostic Result Sleep In/Out Partial Display Mode On Normal Display Mode On Display Inversion Off/On Gamma Set Display Off/On Column Address Set Row Address Set Memory Read/Write Partial Area

Tearing Effect Line Off/On Memory Data Access Control RGB Interface Control Vertical Scroll Start Address of RAM Idle Mode Off/On Interface Pixel Format Read/Write Memory Continue Get/Set Tear Scanline Read/Write Display **Brightness** Read/Write CTRL Display Read/Write Content Adaptive VRH Set Brightness Control and Color Enhancement Read/Write CABC Minimum Frame Rate Control in **Brightness** Read IDI Read ID2

Vertical Scrolling Definition

Read ID3 **RAM Control Porch Setting** Frame Rate Control I (In partial mode/ idle colors) Gate Control Digital Gamma Enable **VCOM Setting** LCM Control ID Code Setting VDV and VRH Command **Enable VDV** Set **VCOM Offset Set** Normal Mode **CABC Control** Register Value Selection I

Register Value Selection 2 Power Control I Enable VAP/VAN signal output Positive/Negative Voltage Gamma Control Digital Gamma Look-up Table for Red/Blue Gate Control SPI2 Enable Power Control 2 Equalize time control Program Mode Control Program Mode Enable **NVM Setting** Program action

## Interface Signals



### Write Command or Data



## Read Parameter or Display Data



## Parallel Interface Timing



- 66 ns \* 48 MHz = 3.168 instruction cycles
- Software-implemented bus **probably** won't exceed this minimum timing requirement
- DMA would be able to (if configured correctly)

VDDI=1.65 to 3.3V, VDD=2.4 to 3.3V, AGND=DGND=0V, Ta= -30 to 70

| Signal                  | Symbol             | Parameter                          | Min | Max | Unit | Description     |
|-------------------------|--------------------|------------------------------------|-----|-----|------|-----------------|
| D/CX                    | T <sub>AST</sub>   | Address setup time                 | 0   |     | ns   |                 |
| DICX                    | T <sub>AHT</sub>   | Address hold time (Write/Read)     | 10  |     | ns   | _               |
|                         | T <sub>CHW</sub>   | Chip select "H" pulse width        | 0   |     | ns   |                 |
|                         | T <sub>CS</sub>    | Chip select setup time (Write)     | 15  |     | ns   |                 |
| CSX                     | T <sub>RCS</sub>   | Chip select setup time (Read ID)   | 45  |     | ns   |                 |
| CSA                     | T <sub>RCSFM</sub> | Chip select setup time (Read FM)   | 355 |     | ns   | _               |
| [                       | T <sub>CSF</sub>   | Chip select wait time (Write/Read) | 10  |     | ns   |                 |
|                         | T <sub>CSH</sub>   | Chip select hold time              | 10  |     | ns   |                 |
|                         | T <sub>wc</sub>    | Write cycle                        | 66  |     | ns   |                 |
| WRX                     | T <sub>WRH</sub>   | Control pulse "H" duration         | 15  |     | ns   |                 |
|                         | T <sub>WRL</sub>   | Control pulse "L" duration         | 15  |     | ns   |                 |
|                         | $T_{RC}$           | Read cycle (ID)                    | 160 |     | ns   |                 |
| RDX (ID)                | T <sub>RDH</sub>   | Control pulse "H" duration (ID)    | 90  |     | ns   | When read ID da |
| T <sub>RDL</sub>        |                    | Control pulse "L" duration (ID)    | 45  |     | ns   |                 |
| RDX -                   | T <sub>RCFM</sub>  | Read cycle (FM)                    | 450 |     | ns   | When read fron  |
|                         | T <sub>RDHFM</sub> | Control pulse "H" duration (FM)    |     |     | ns   | frame memory    |
| (FM) T <sub>RDLFM</sub> |                    | Control pulse "L" duration (FM)    | 355 |     | ns   | maine memory    |
| D[17:0]                 | T <sub>DST</sub>   | Data setup time                    | 10  |     | ns   | For CL=30pF     |

### Font Data Structures

### Font Array (uint8\_t [])

Font Header

Glyph Index (one per glyph)

Glyph Bitmap Data

