Deep learning

cooldudetta's version from 2017-09-17 22:22

Machine learning

Searching for useful representations of some input data, within a pre-defined space of possibilities, and using guidance from some feedback signal.
It automatically finds transformations of data that turn that data into more useful representations for a given task

Deep learning

subset of machine learning that uses layers to create meaningful representations of data
it creates models of that data and the number of layers that contribute to a specific data model is called the "depth" of a model
those layers of representation are all learned automatically from exposure to training data, via models called "neural networs"
every layer is "weighted" which represents the specification of what a layer does to its input data
"learning" means finding a set of values for the weights of every layer in the neural network such that the network will correctly map the example inputs to their associated targets
the "loss function" of the network measures how far is the predicted output from the expected output
the loss function calculates the "distance score" (loss score) which is used as a feedback signal to adjust the value of the weights by a little bit
after the loss score is calculated it goes into an "optimizer" that will adjust the weights to improve the predictions


Question Answer
Relu expressionoutput = relu(dot(W, input) + b). Each neural network layer transforms its data according to that equation.
Random initializationthe weight matrices (W and b in Relu expression) are filled with small random values to begin with
Training the systemGradually adjust the weights based on a feedback signal
Training loopRepeat the following - 1. Draw a batch of training samples x and corresponding targets y, 2. Run the network on x (this is called "forward pass"), obtain predictions y_pred, 3. compute the "loss" of the network on the batch, a measure of the mismatch between y_pred and y, 4. update all weights of the network in a way that slightly reduces the loss on this batch
DerivativeRate of change - the amount by which a function is changing at one given point (dy/dx)
Gradientthe derivative of a tensor operation, the generalization of the concept of derivative to functions of multi-dimensional inputs (functions that take tensors as inputs)
Training loop step 4Compute the gradient of the loss function (much faster) with regard to the parameters of the network (this is called "backward pass"), 2. Move the parameters a little in the direction opposite to the gradient (W -= step * gradient), thus lowering the loss on the batch by a bit.