# Deep learning

rename
cooldudetta's
version from
2017-09-17 22:22

## Machine learning

Searching for useful representations of some input data, within a pre-defined space of possibilities, and using guidance from some feedback signal.

It automatically finds transformations of data that turn that data into more useful representations for a given task

## Deep learning

subset of machine learning that uses layers to create meaningful representations of data

it creates models of that data and the number of layers that contribute to a specific data model is called the "depth" of a model

those layers of representation are all learned automatically from exposure to training data, via models called "neural networs"

every layer is "weighted" which represents the specification of what a layer does to its input data

"learning" means finding a set of values for the weights of every layer in the neural network such that the network will correctly map the example inputs to their associated targets

the "loss function" of the network measures how far is the predicted output from the expected output

the loss function calculates the "distance score" (loss score) which is used as a feedback signal to adjust the value of the weights by a little bit

after the loss score is calculated it goes into an "optimizer" that will adjust the weights to improve the predictions

## Terms

Question | Answer |
---|---|

Relu expression | output = relu(dot(W, input) + b). Each neural network layer transforms its data according to that equation. |

Random initialization | the weight matrices (W and b in Relu expression) are filled with small random values to begin with |

Training the system | Gradually adjust the weights based on a feedback signal |

Training loop | Repeat the following - 1. Draw a batch of training samples x and corresponding targets y, 2. Run the network on x (this is called "forward pass"), obtain predictions y_pred, 3. compute the "loss" of the network on the batch, a measure of the mismatch between y_pred and y, 4. update all weights of the network in a way that slightly reduces the loss on this batch |

Derivative | Rate of change - the amount by which a function is changing at one given point (dy/dx) |

Gradient | the derivative of a tensor operation, the generalization of the concept of derivative to functions of multi-dimensional inputs (functions that take tensors as inputs) |

Training loop step 4 | Compute the gradient of the loss function (much faster) with regard to the parameters of the network (this is called "backward pass"), 2. Move the parameters a little in the direction opposite to the gradient (W -= step * gradient), thus lowering the loss on the batch by a bit. |

## Pages linking here (main versions and versions by same user)

No other pages link to this page. See Linking Quickstart for more info.