Machine learning¶

Machine learning algorithms can be classified according to

Types of problems¶

classification
- output of the model is a discrete set of categories
  - examples:
    - spam detection
    - pion/proton discrimination
    - positive/negative COVID test
regression
- output of the model is a continous variable
- examples:
  - value of stock
  - country GDP

The boundary between the two types can be blured:

when the categories have an ordering we can use regression and bin the result into categories
- A*, A, B, C, ... grades
- number of stars for a review
- energy rating of a building
for classification we can fit a function for the probability of belonging to one class

The entire training set is used for each iteration of the model optimisation

The model is updated for each new training example.

The model is optimised for subsets of the training set

instance based
- uses examples to learn
- need "similarity" measure to compare new data to training data
model based
- we use a model to quantify the relationship between the data
- the data fixes the parameters of the model

Example: predicting final grade $g_4$ of a student given their 1st, 2nd and 3rd year result $g_1$, $g_2$ and $g_3$.

instance-based:
- look at historical results and find the student who has the closest marks to the student we want to predict the result from
- use the final grade of the past student as the prediction for the new student
- could look at a set of historical students and average
model-based:
- we can hypothesize a linear dependency:
$$ g_4 = c_1 g_1 + c_2 g_2 +c_3 g_3$$
- fit the coefficients $c_1$, $c_2$, $c_3$ to historical data and use them to predict the new student's final grade.

linear models
- perceptron
- logistic regression
- SVM (support vector machine)
- ...
non-linear models
- polynomial features
- neural networks
- ...