Hi, I'm Nicholas Johnson!

software engineer / trainer / AI enthusiast

Syntax and Terminology

In this section, we’re going to run through most of the basic syntax and terminology that you’ll need to interpret mathematical equations, specifically for machine learning. This will give you a basis to understand the rest of the book.

Don’t worry if not everything makes sense yet. You’ll probably wish to refer back to this chapter as you move on.


Variables in Maths are not like variables in programming. In software I might write

a = a + 1;

I can do this because a is a pointer to a location in memory.

This makes no sense in mathematics. Variables in Maths have values or values that can be discovered. You can’t just set them to whatever you want. If I write:

a=12a = 12

aa is now 12 for the duration of the equation.

Capitals, Bold and Italic in Variable Names

  • xx - lower case italic letters are used for variables representing scalar values, eg 55
  • x\mathbf{x} - bold lower case letters are used for vectors.
  • X\mathbf{X} - bold capitals are used to represent matrices and tensors, eg [1,2,3][1,2,3]

This isn’t necessarily true for all Maths, but has become a convention in machine learning.

Scalars, Vectors, Matrices and Tensors

In Maths we have a lot of words for arrays. This is for historical reasons, various folks writing at different times using different words for related concepts.


A scalar is a single number. You can happily conceptualise this as a zero dimensional array or tensor if it makes you happy, the math will still work out.

Here’s a scalar in action:

x=5x = 5


Vectors are one dimensional arrays, just like regular arrays. They also share the same square bracket and comma syntax, which is nice:

x=[1,2,3]\mathbf{x} = [1, 2, 3]


Matrices are two dimensional arrays. We can optionally surround them with square braces. A matrix is an array of arrays:

X=[a11a12a13a14a15a21a22a23a24a25a31a32a33a34a35a41a42a43a44a45a51a52a53a54a55]X = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} & a_{15} \\ a_{21} & a_{22} & a_{23} & a_{24} & a_{25} \\ a_{31} & a_{32} & a_{33} & a_{34} & a_{35} \\ a_{41} & a_{42} & a_{43} & a_{44} & a_{45} \\ a_{51} & a_{52} & a_{53} & a_{54} & a_{55} \\ \end{bmatrix}


Tensors are N dimensional arrays. A one dimensional tensor is also a vector. A two dimensional tensor is also a matrix.

Tensors are hard to represent on paper, but we could have a go at showing a 3d tensor using a vector of matrices:

X=[[x111x112x113x114x115x121x122x123x124x125x131x132x133x134x135x141x142x143x144x145x151x152x153x154x155][x211x212x213x214x215x221x222x223x224x225x231x232x233x234x235x241x242x243x244x245x251x252x253x254x255][x511x512x513x514x515x521x522x523x524x525x531x532x533x534x535x541x542x543x544x545x551x552x553x554x555]]X = \left[ \begin{array}{c} \begin{bmatrix} x_{111} & x_{112} & x_{113} & x_{114} & x_{115} \\ x_{121} & x_{122} & x_{123} & x_{124} & x_{125} \\ x_{131} & x_{132} & x_{133} & x_{134} & x_{135} \\ x_{141} & x_{142} & x_{143} & x_{144} & x_{145} \\ x_{151} & x_{152} & x_{153} & x_{154} & x_{155} \\ \end{bmatrix} \\ \begin{bmatrix} x_{211} & x_{212} & x_{213} & x_{214} & x_{215} \\ x_{221} & x_{222} & x_{223} & x_{224} & x_{225} \\ x_{231} & x_{232} & x_{233} & x_{234} & x_{235} \\ x_{241} & x_{242} & x_{243} & x_{244} & x_{245} \\ x_{251} & x_{252} & x_{253} & x_{254} & x_{255} \\ \end{bmatrix} \\ \vdots \\ \begin{bmatrix} x_{511} & x_{512} & x_{513} & x_{514} & x_{515} \\ x_{521} & x_{522} & x_{523} & x_{524} & x_{525} \\ x_{531} & x_{532} & x_{533} & x_{534} & x_{535} \\ x_{541} & x_{542} & x_{543} & x_{544} & x_{545} \\ x_{551} & x_{552} & x_{553} & x_{554} & x_{555} \\ \end{bmatrix} \end{array} \right]

If we want to specify that a variable X contains a 5×5×5×55 \times 5\times 5\times 5 tensor without drawing the tensor, we can do so by saying that Y is in the set of real numbers 5×5×5×55 \times 5 \times 5 \times 5. More on sets later.

YR5×5×5×5×5Y \in \mathbb{ℝ}^{5 \times 5 \times 5 \times 5 \times 5}

When writing code, we use tensors most of the time regardless of how many dimensionw we need. You may have heard of a package called TensorFlow? PyTorch also deals with tensors, as does Jax.

Single or Double Vertical Bars around vectors denote the length of the vector ||

Single or double bars around a vector denote the length of a vector. These two notations are used interchangeably in different contexts.

To find the length (or magnitude) of a vector w\mathbf{w} given by w=[1,2]\mathbf{w} = [1,2], you can use the Pythagorean formula for the magnitude of a 2D vector:

w=w=w12+w22|\mathbf{w}| = ||\mathbf{w}|| = \sqrt{w_1^2 + w_2^2}


  • w1w_1 is the first component of the vector (in this case, 1).
  • w2w_2 is the second component of the vector (in this case, 2).

Plugging in the values from vector w\mathbf{w}:

w=12+22|\mathbf{w}| = \sqrt{1^2 + 2^2} w=1+4|\mathbf{w}| = \sqrt{1 + 4} w=5|\mathbf{w}| = \sqrt{5}

So, the magnitude (or length) of vector w\mathbf{w} is 5\sqrt{5}.

Single Vertical Bars around scalars are absolute values |

Single bars can also mean the absolute value of a scalar

a=5a=5a = -5 |a| = 5

We can also write this as a function:

abs(a)=5\text{abs}(a) = 5

We can define this like so:

a={aif a0aif a<0|a| = \begin{cases} a & \text{if } a \geq 0 \\ -a & \text{if } a < 0 \end{cases}

Occasioanlly you will also see this:

a=a2|a| = \sqrt{a^2}

Vertical bars in sets mean “such that”

A single bar separating two side of an equation means “such that”

TODO: Find real world example of this.


A hat means a prediction.

TODO: Create formula

We often use yHat as a variable name, to mean our current prediction. It’s not necessarily the final value, just what we have right now in this current training epoch.


A star means an ideal value, it’s the value we want to get to