Neural Networks Basics

Shashank Shekhar
Jan 17, 2024
6 min read

1. Neurons:

Neural networks are composed of interconnected nodes called neurons.
Neurons receive inputs, apply weights, and produce an output using an activation function.

2. Layers:

Neural networks consist of layers: Input layer, Hidden layer(s), and Output layer.
Each layer can have multiple neurons.

3. Weights and Biases:

Weights determine the strength of connections between neurons.
Biases add an offset to the weighted sum of inputs.

4. Feedforward Process:

Input signals are fed forward through the network.
Activations are calculated at each layer until the final output is produced.

Activation Functions:

1. Sigmoid:

Maps input values to a range between 0 and 1.
Commonly used in the output layer of binary classification problems.

2. Hyperbolic Tangent (tanh):

Similar to the sigmoid but maps input values to a range between -1 and 1.
Useful in scenarios where the output needs to be centered around zero.

3. Rectified Linear Unit (ReLU):

Sets negative values to zero and passes positive values unchanged.
Widely used in hidden layers due to its simplicity and effectiveness.

4. Softmax:

Converts a vector of real values into a probability distribution.
Often used in the output layer for multi-class classification problems.

Backpropagation:

1. Loss Function:

Measures the difference between the predicted output and the actual target.
Common loss functions include Mean Squared Error (MSE) for regression and Cross-Entropy for classification.

2. Gradient Descent:

Optimizes the neural network by minimizing the loss function.
Adjusts weights and biases to move towards the global minimum of the loss.

3. Forward Pass:

Input is passed through the network to generate predictions.
Activations and outputs at each layer are stored for later use.

4. Backward Pass:

Gradients of the loss with respect to weights and biases are calculated using the chain rule.
Weights and biases are updated using the gradients and a learning rate.

5. Learning Rate:

Determines the size of the step taken during gradient descent.
Too large a learning rate may cause overshooting, while too small may result in slow convergence.

6. Epochs and Batches:

Training occurs over multiple iterations (epochs).
Data is often divided into batches for efficiency.

7. Stochastic Gradient Descent (SGD):

Optimizes using a subset of training data in each iteration.
Randomly selects batches for updating weights.

8. Momentum and Adam Optimization:

Momentum helps accelerate SGD in the relevant direction.
Adam combines ideas from momentum and RMSprop for adaptive learning rates.

Let's consider a simple use case: predicting the price of a house based on its size. We'll create a simple neural network for this regression task using Python and the popular deep learning library, TensorFlow:

# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Generate some random data for demonstration
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 50 + 15 * X + np.random.randn(100, 1)

# Build a simple neural network model
model = Sequential()
model.add(Dense(units=1, input_dim=1, activation='linear'))

# Compile the model
model.compile(optimizer='sgd', loss='mean_squared_error')

# Train the model
model.fit(X, y, epochs=50, batch_size=4)

# Make predictions on new data
new_data = np.array([[1.5], [3.0]])
predictions = model.predict(new_data)

# Print the predictions
print("Predictions:")
for i in range(len(new_data)):
    print(f"Input: {new_data[i][0]}, Predicted Price: {predictions[i][0]}")

Explanation:

We generate random data for training our model. In a real-world scenario, you would have a dataset with features (in this case, house size) and corresponding labels (house prices).
We build a simple neural network with one input layer and one output layer using the Sequential API of TensorFlow's Keras.
The model is compiled with stochastic gradient descent (sgd) as the optimizer and mean squared error as the loss function, suitable for regression tasks.
We train the model on our generated data for 50 epochs.
Finally, we make predictions on new data (new_data) to see how well our model generalizes.

On running the program, one should expect the output like:

Epoch 45/50

25/25 [==============================] - 0s 1ms/step - loss: 0.8756

Epoch 46/50

25/25 [==============================] - 0s 2ms/step - loss: 0.8689

Epoch 47/50

25/25 [==============================] - 0s 1ms/step - loss: 0.8600

Epoch 48/50

25/25 [==============================] - 0s 1ms/step - loss: 0.8542

Epoch 49/50

25/25 [==============================] - 0s 2ms/step - loss: 0.8464

Epoch 50/50

25/25 [==============================] - 0s 2ms/step - loss: 0.8432

1/1 [==============================] - 0s 80ms/step

Predictions:

Input: 1.5, Predicted Price: 72.48058319091797

Input: 3.0, Predicted Price: 95.02782440185547

Let's alter the program such that the number of epochs are guided by the desired loss:

# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Generate some random data for demonstration
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 50 + 15 * X + np.random.randn(100, 1)

# Build a simple neural network model
model = Sequential()
model.add(Dense(units=1, input_dim=1, activation='linear'))

# Compile the model
model.compile(optimizer='sgd', loss='mean_squared_error')

# Train the model until the loss is less than 0.01 between consecutive epochs
previous_loss = float('inf')  # Initialize with a large value
epochs = 0

while True:
    model.fit(X, y, epochs=1, batch_size=4, verbose=0)  # Train for one epoch
    current_loss = model.evaluate(X, y, batch_size=4, verbose=0)  # Evaluate current loss

    # Print the loss after each epoch
    print(f"Epoch {epochs + 1}, Loss: {current_loss}")

    # Check if the loss difference is less than 0.01
    if previous_loss - current_loss < 0.01:
        break

    previous_loss = current_loss
    epochs += 1

# Make predictions on new data
new_data = np.array([[1.5], [3.0]])
predictions = model.predict(new_data)

# Print the predictions
print("\nPredictions:")
for i in range(len(new_data)):
    print(f"Input: {new_data[i][0]}, Predicted Price: {predictions[i][0]}")

Let's increase the complexity of the problem by considering a multi-feature regression task. In this example, we'll predict the price of a house based on multiple features such as size, number of bedrooms, and number of bathrooms.

# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Generate random data for demonstration
np.random.seed(42)
size = 2 * np.random.rand(100, 1)
bedrooms = np.random.randint(1, 5, size=(100, 1))
bathrooms = np.random.randint(1, 4, size=(100, 1))
X = np.concatenate([size, bedrooms, bathrooms], axis=1)
true_prices = 50 + 15 * size + 2 * bedrooms + 1.5 * bathrooms + np.random.randn(100, 1)

# Build a neural network model with multiple features
model = Sequential()
model.add(Dense(units=10, input_dim=3, activation='relu'))
model.add(Dense(units=1, activation='linear'))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X, true_prices, epochs=100, batch_size=4)

# Make predictions on new data
new_data = np.array([[1.8, 3, 2]])  # Example input with size, bedrooms, and bathrooms
prediction = model.predict(new_data)

# Print the prediction
print(f"Input: {new_data[0]}, Predicted Price: {prediction[0][0]}")

Expected output will be as under:

Epoch 95/100

25/25 [==============================] - 0s 2ms/step - loss: 142.6684

Epoch 96/100

25/25 [==============================] - 0s 1ms/step - loss: 141.7294

Epoch 97/100

25/25 [==============================] - 0s 1ms/step - loss: 140.8277

Epoch 98/100

25/25 [==============================] - 0s 2ms/step - loss: 139.8784

Epoch 99/100

25/25 [==============================] - 0s 1ms/step - loss: 138.7852

Epoch 100/100

25/25 [==============================] - 0s 881us/step - loss: 137.8268

1/1 [==============================] - 0s 76ms/step

Input: [1.8 3. 2. ], Predicted Price: 84.55965423583984

Let's increase the complexity of the problem further by adding two more features and hence the model complexity will increase too:

# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Generate random data for demonstration
np.random.seed(42)
size = 2 * np.random.rand(100, 1)
bedrooms = np.random.randint(1, 5, size=(100, 1))
bathrooms = np.random.randint(1, 4, size=(100, 1))
other_features = np.random.rand(100, 2)
X = np.concatenate([size, bedrooms, bathrooms, other_features], axis=1)
true_prices = 50 + 15 * size + 2 * bedrooms + 1.5 * bathrooms + 0.5 * other_features[:, 0] + 0.8 * other_features[:, 1] + np.random.randn(100, 1)

# Build a more complex neural network model
model = Sequential()
model.add(Dense(units=10, input_dim=5, activation='relu'))
model.add(Dense(units=5, activation='relu'))
model.add(Dense(units=1, activation='linear'))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X, true_prices, epochs=200, batch_size=4, verbose=0)

# Make predictions on new data
new_data = np.array([[1.8, 3, 2, 0.7, 0.9]])  # Example input with size, bedrooms, bathrooms, and other features
prediction = model.predict(new_data)

# Print the prediction
print(f"Input: {new_data[0]}, Predicted Price: {prediction[0][0]}")

In this example:

We generate random data for five features: size, number of bedrooms, number of bathrooms, and two additional features (other_features).
The neural network model now has two hidden layers with ReLU activation, making it more complex.
We compile the model using the Adam optimizer and mean squared error loss.
The model is trained on the generated data for 200 epochs.
We make predictions on new data with multiple features (new_data) to see how well our more complex model performs

To avoid overfitting, we will improve the dropout layers:

Neural Networks Basics

Recent Posts

Comments