Phillip Stanley-Marbell
Foundations of Embedded Systems
Department of Engineering, University of Cambridge
http://physcomp.eng.cam.ac.uk
Topic 06: Digital Logic, FPGAs, and Verilog
(~60 minutes)
Version 0.2020
Pre-Recorded
Video
48
Intended Learning Outcomes for This Topic
2
Enumerate the differences between PALs, PLAs/CPLDs, and FPGAs
By the end of this topic, you should be able to:
Enumerate differences between a programmable processor and programmable logic
Describe the fundamental difference between programming languages and HDLs
Identify several different methods to use circuits to achieve computation
Conceptualize an embedded system design based on the iCE40 FPGA
48
buslock=0, buslocker=-1, EX touches mem = 0
WB: [TAS],0
MA: []
EX: [ANDI],1
ID: [0xe100],0
IF: [0x6013],0
node ID=0, PC=0x80042ba, ICLK=1668327, sleep?=0
buslock=1, buslocker=0, EX touches mem = 0
WB: []
MA: [ANDI],2
EX: []
ID: []
IF: [0xd109],0
node ID=0, PC=0x80042bc, ICLK=1668328, sleep?=0
buslock=1, buslocker=0, EX touches mem = 0
WB: []
MA: [ANDI],1
EX: []
ID: [0xd109],0
IF: [0x6413],0
node ID=0, PC=0x80042be, ICLK=1668329, sleep?=0
buslock=1, buslocker=0, EX touches mem = 1
WB: [ANDI],0
MA: []
EX: [MOVBSG],1
ID: [0x6413],0
IF: [0xd109],0
node ID=0, PC=0x80042c0, ICLK=1668330, sleep?=0
buslock=1, buslocker=0, EX touches mem = 0
WB: []
MA: [MOVBSG],0
EX: [MACL],1
ID: [0xd109],0
IF: [0x410b],0
Multiplexing Hardware in Time: Microprocessors
3
16 + 8
Architectural
registers
clk
clk
data
Main
memory
Cache
address
data
addr
addr
clk
clk
data
clk
Memory-mapped
peripherals
Timer / RTC
UART
A/D Converter
Battery Monitor
Failure Monitor
Network Interface
= Structures modeled at bit-level, enabling monitoring of signal transition activity and SEUs during simulation
Programmable
clock source
Program
Counter
clk
Interrupt
Controller
Register File write-back
Memory Access
Execute
Decode
Fetch
Memory
Management
Unit (MMU)
▶︎
See Sunflower showpipe command
48
Multiplexing Hardware in Time: Microprocessors
4
It’s all just a collection of gates
Logic gates are chained in space to construct adders, multipliers, whole ALUs, pipelines, and so on
a
b
a op b
The whole processor is reused in time, to execute multiple iterations of algorithms
EX
Image source: Wikipedia
48
Multiplexing Hardware in Space: Programmable Logic Devices
5
Microprocessors
Fixed hardware; achieve different functionality by loading different programs
Programmable Logic
▶︎ Programmable in this case means configurable
▶︎ Achieve different functionality by wiring up generic components
▶︎ Generic components may be a collection of ANDs and ORs, a collection of lookup tables (LUTs), etc.
Logic gates are connected in a fixed configuration to construct adders, multipliers, whole ALUs, and so on
a
b
a op b
EX
48
PALs, PLAs (CPLDs), and FPGAs
6
Historical progression
▶︎ The earliest programmable logic devices were one-time mask-programmable, not reprogrammable
▶︎ One of the earliest reprogrammable logic array devices was the Altera EP300 (1984):
Source: Altera
48
PALs, PLAs (CPLDs), and FPGAs
8
PAL Architecture: Programmable AND array and fixed OR Array
▶︎ Any of the macrocell’s inputs or its complement can be routed to any AND gate in the AND array
▶︎ Design is broken up into macrocells. Each macrocell is a sum of products
PALs (and PLAs, which we will see next) are a good match for designs that are mostly combinational logic
All the product terms are summed in a single OR gate for each macrocell
Source: Altera
48
PALs, PLAs (CPLDs), and FPGAs
9
PLA/CPLD: Programmable AND array and programmable OR Array
▶︎ Design is again broken up into macrocells
(Xilinx CoolRunner-II)
Source: Xilinx
48
PALs vs PLAs/CPLDs
10
PAL Architecture
Programmable AND array, fixed OR array
Both PALs and PLAs/CPLDs: have few sequential logic elements; do not scale to large designs
PLA/CPLD Architecture
Programmable AND and OR arrays
Source: Xilinx
Source: Altera
48
Field-Programmable Gate Arrays (FPGAs)
11
Fine-grained: A large collection of generic LUTs rather than AND and OR arrays
All can be wired together in essentially arbitrary topologies
Source: Lattice
48
The Focus FPGA for This Topic: Lattice iCE40 Ultra Plus FPGA
12
48
The Focus FPGA for This Topic: Lattice iCE40 Ultra Plus FPGA
13
Source: Lattice
48
Size Comparison: Lattice iCE40 vs Typical FPGAs
14
Think of the iCE40 as an IC with two SPI interfaces, two I2C interfaces, about
5000 4-input lookup tables and D-flip-flops, and ~20 configurable I/O buffers
and differential amplifiers which you can wire together in (essentially) any
topology of your choosing
Source: Lattice
Source: Xilinx
48
iCE40: 8 Logic Cells (LCs) = 1 Programmable Logic Block (PLB)
15
Source: Lattice
48
Hardware Description Languages (HDLs)
18
VHDL
VHSIC (Very High Speed Integrated Circuit) Hardware Description Language
Verilog
We will use the Verilog HDL
HDLs allow you to describe hardware
Behavioral HDL: Describe what happens when
Structural HDL: Describe components and their connections
There are many existing and historical HDLs, ranging from ABEL, to Bluespec and Clash, to VHDL
48
Why We Use Verilog
19
You have already been introduced to VHDL in other modules/courses
The only available open-source FPGA tools (Yosys/NextPNR) only support Verilog
Knowing another HDL will make you a more versatile engineer
Many engineers and researchers use both
48
Digital Hardware Designs Made of Interconnected Modules
20
We’ll design digital hardware to achieve a given computational task using:
Modules to implement pieces of functionality
The modules might be purely combinational or might contain combinational and sequential logic
We’ll have a part of the design which we’ll call the “toplevel
1
, that connects the modules together
Combinational modules: inputs appear at outputs after the time taken for signals to propagate
Sequential modules contain state: flip-flops (clocked) and latches (unclocked)
Module
inputs outputs
in1
inN
out1
outM
1
Tools note: yosys assumes the last Verilog file in its arguments is the toplevel
48
21
Syntax of Verilog
48
Introduction to Verilog Syntax: A Simple Verilog Module
22
Note: Syntax highlighting
48
Introduction to Verilog Syntax: A Simple Verilog Module
23
Note: Syntax highlighting
Alternative syntax for ports list:
48
Instantiating Modules
24
48
Logic Values in Verilog
25
Signals can take on one of four values: 0, 1, Z, X
0
1
Z
X
Verilog Radix Notation Examples
1’b0
8’b00010100
8’hff
8
b
10000100
: High impedance or floating (synonymous with ‘?’)
: Boolean value 1 / high logic value
: Boolean value 0 / low logic value
: Unknown or don’t care
48
Declaring Wires, Registers, and Memories
26
A single-bit wire
wire [7:0] someBus;
reg [7:0] someRegister;
reg [7:0] someMemory[0:31];
A 32-byte memory. Memories you declare this way will
be synthesized to embedded block RAMs (if available / if
possible), else to flip flops
An 8-bit wire (bus). The specifier [7:0] is a convention: It is
semantically equivalent to saying [1:8] or [8:1] or [0:7].
The Verilog tools simply look at [index1:index2] and allocate a
width of size abs(index1-index2) +1
wire someWire;
reg someRegister;
A 1-bit register
An 8-bit register
48
Bitwise Operators in Verilog
27
~
Bitwise NOT
&
Bitwise AND
|
Bitwise OR
^
Bitwise XOR
You can apply them to single bits or to buses (see later slide where we use ~ on a 12-bit bus)
48
Logical Operators in Verilog
28
!
Logical NOT
&&
Logical AND
||
Logical OR
48
Unary Reduction Operators in Verilog
29
&
AND Reduction operator
~&
NAND Reduction operator
|
OR Reduction operator
~|
NOR Reduction operator
^
XOR Reduction operator
~^
XNOR Reduction operator
Reduction operators: Apply an operation across all the bits of a multi-bit signal
Example: Compute the parity of a multi-bit signal: assign parity = ^result[31:0];
48
30
Simulation and Testbenches in Verilog
48
Testbenches
31
Recall:
Testbenches are test harnesses that you use to test your design
48
Icarus Verilog (iverilog)
32
iverilog is a tool to allow you to simulate your Verilog design
It compiles your Verilog design into an executable
When you run the generated executable, it will generate a signal trace based on
your testbench and will place this trace in a file with the extension .vcd
You can then view the signal trace using a waveform viewer such as gtkwave
$ iverilog -o simpleTestbench simple.v simpleTestbench.v
$ ./simpleTestbench
48
Using gtkwave
33
$ gtkwave simpleTestbench.vcd
48
34
An End-to-End Example
48
Verilog Example: VDBS Encoder
35
These do not employ any channel modulation
Each 10 or 01 transition costs energy
Microcontroller/DSP
Sensors
(acceleration, sound,
pressure, etc.)
SPI/I2C
I2S
clock
data
data
clock
[SMR2016b] P. Stanley-Marbell and M. Rinard. “Reducing Serial I/O Power in Error-Tolerant Applications by Efficient Lossy Encoding”, IEEE/ACM DAC, 2016.
At 1Mb/s, communication power on printed circuit boards is between 2μW and 40μW
This is up to 13% of power dissipation of an ARM Cortex-M0+ running at 2MHz/3V
Moving data costs energy: 10–100pJ/bit on-chip, 1000–10,000pJ/bit off-chip
48
Verilog Example: VDBS Encoder
36
P
s,t,m
=(|s t|m) ^

#
l
(s) #
l
(t)
0
.
We define a predicate P in terms of #
δ
(s):
Let #
δ
(s) be number of transitions in s and let
l
s,t
= |#
l
(s) #
l
(t)|
Given input s and tolerable error m, optimum transition-reducing encoding of s is
given by value 𝞽 such that
e
l
1
(s, m)=
s.t. P
s,,m
^
|s | =min
0<i<2
l
1
|s i|
◆◆
,
e
l
2
(s, m)=
s.t. P
s,,m
^
|s | = max
0<i<2
l
1
|s i|
◆◆
,
e
l
3
(s, m)=
s.t. P
s,,m
^
l
s,
=min
0<i<2
l
1
l
s,i
◆◆
,
e
l
4
(s, m)=
s.t. P
s,,m
^
l
s,
= max
0<i<2
l
1
l
s,i
◆◆
.
Can halve number of transitions while inducing minimal error (e.g., just 0.4% for 8-bit values)
l = 8
s = 64
10
= 01000000
2
0 1 0 0 0 0 0 0
Source
Destination
s s
m = 13
log
2
(m) bits
l = 8
t = 63
10
= 00111111
2
0 0 1 1 1 1 1 1
Encoder
Destination
s t
m = 13
log
2
(m) bits
Predicate satisfied by
value-deviation-bounded serial
(VDBS) encoders
[SMR2016b] P. Stanley-Marbell and M. Rinard. “Reducing Serial I/O Power in Error-Tolerant Applications by Efficient Lossy Encoding”, IEEE/ACM DAC, 2016.
48
x-component analysis
y-component analysis
z-component analysis
Pedometer / Step Counting System
Low-
Pass
Filter
Maximum-
Activity
Axis
Selection
Step
Count
Extremal
Value
Marking
Processor
VDBS Encoder
Accelerometer
Verilog Example: VDBS Encoder
37
Pedometer algorithm: N. Zhao. “Full-Featured Pedometer Design Realized with 3-Axis Digital Accelerometer”. Analog Dialogue, 44(06), June 2010.
[SMR2016b] P. Stanley-Marbell and M. Rinard. “Reducing Serial I/O Power in Error-Tolerant Applications by Efficient Lossy Encoding”, IEEE/ACM DAC, 2016.
(Source: ifitxit.com)
ARM Cortex M3 Microcontroller
Bluetooth Low-Energy IC
Accelerometer IC
(Source: Fitbit)
48



 ()

([- ]  )
Extremal-value marking
Verilog Example: VDBS Encoder
38


 ()

([- ]  )




 ()

([- ]  )




 ()

([- ]  )
Accelerometer data
w/o VDBS encoding
Reports 19 steps
Reports 20 steps
Maximal activity axis
Low-pass filter
Extremal-value marking



 ()

([- ]  )
Maximal activity axis



 ()

([- ]  )
Low-pass filter
3-axis accelerometer data from WISDM activity recognition dataset: J. R. Kwapisz, et al. “Activity recognition using cell phone accelerometers”. SIGKDD Explor. Newsl., 12(2):74– 82, Mar. 2011.
With VDBS:
VDBS reduces transitions by 54%
48
VDBS Has Minimal Effect on Pedometer Behavior
39
-
-



  (% )

  (% )
1
2
3
4
7
8
9
14
16
34
35
36
VDBS reduces transitions by up to 63% (mean 54%), average step count error of < 5%







  (% )
  (% )
1
2
3
4
7
8
9
14
16
34
35
36
Person IDs:
   







  (% )
  (%)
   
-



  (% )
   (%)
3-axis accelerometer data from WISDM activity recognition dataset:
J. R. Kwapisz, et al. “Activity recognition using cell phone accelerometers”. SIGKDD Explor. Newsl., 12(2):74– 82, Mar. 2011.
STC: Serial Transition Count
FSR: Full-Scale Range
Results Summary
Applying VDBS to accelerometer data
for 12 users (~4.6hrs of walking at 20Hz sampling)
48
Verilog Example: VDBS Encoder
40
48
Verilog Example: VDBS Encoder
41
48
Verilog Example: VDBS Encoder
42
48
The Same VDBS Encoder in VHDL
43
vdbs2rtl generates VHDL or Verilog for any desired VDBS encoder configuration
[SMR2016c] P. Stanley-Marbell, P. A. Francese, and M. Rinard. “Encoder Logic for Reducing Serial I/O Power in Sensors and Sensor Hubs”, IEEE Hot Chips, 2016.
8-bit encoder: 61 LUTs / 865 gates; 12-bit encoder: 100 LUTs / 12868 gates (iCECube2 and Yosys)
http://physcomp.eng.cam.ac.uk/demos.html
48
Something Different: Programmable Analog
44
Block Diagram of Relevant Internal Subsystem
of Each MAX11300 IC
SPI Interface
from microcontroller
MISO
MOSI
SCLK
/CS
Programmable
Analog
Matrix
Analog Inputs
Analog Outputs
Digital Control
ADC
DAC
Switch
GPIO
0
9
0
9
Digital Control
MAX11300 Field-Programmable Analog Array (sort-of…)
Analog
Analog
Digital Control
48
Something Different: More Programmable Analog
45
(Evaluation FPAA hardware courtesy of Toshiba Japan)
Toshiba Field-Programmable Analog Array
Comparators
4
Operational
Amplifiers
4
Voltage
Regulator
1 (2.5V)
Bandgap
Reference (BGR)
1
Constant
Current Source
1
Logic Gates
2 NOTs, 3 NANDs,
3 NORs
Other Circuit
Elements
6 PMOS, 6 NMOS,
20 20KΩ Resistors,
2 10pF Capacitors
48
Further Reading
46
The Verilog Language Reference Manual:
See the Verilog language reference, Verilog-LRM-IEEE-Std-1364-2001.pdf
https://verilogguide.readthedocs.io/en/latest/verilog/overview.html
A Good Verilog Tutorial
Other Good Verilog Resources
http://chiphack.org/talks/basic-verilog/html/index.html
http://chiphack.org/talks/combinatorial-verilog/html/index.html
http://chiphack.org/talks/sequential-verilog/html/index.html
▶︎ Complete these online self-assessments on https://f-of-e.org/
▶︎ Like learning to swim, you can’t learn all you need from a textbook
Best next step: Get some practice and test your understanding
https://f-of-e.org/chapter-06/#exercises
48
Things to Do
47
Complete a “muddiest point” two-question survey using this link