Tanh Function in Neural Network
The tanh function, short for hyperbolic tangent function, is another commonly used activation function in neural networks. It maps any real-valued number into a value between -1 and 1. This function is similar to the sigmoid function but offers some advantages that make it more suitable for certain applications.
Mathematical Definition
The tanh function is defined as:
tanh(x)=ex−e−xex+e−x\tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}tanh(x)=ex+e−xex−e−x
Where eee is the base of the natural logarithm (approximately equal to 2.71828).
Characteristics of the Tanh Function
- Range: The output values of the tanh function lie between -1 and 1.
- Shape: It has an S-shaped curve, similar to the sigmoid function but centered around zero.
- Monotonicity: It is a monotonic function, meaning it is always increasing or decreasing.
- Derivative: The derivative of the tanh function is given by:
tanh′(x)=1−tanh2(x)\tanh'(x) = 1 – \tanh^2(x)tanh′(x)=1−tanh2(x)
This property is useful during the backpropagation process in training neural networks.
Advantages of the Tanh Function
- Zero-Centered Output: The outputs of the tanh function are zero-centered, meaning they range from -1 to 1. This can make the optimization process easier as the gradients tend to oscillate less compared to non-zero-centered outputs.
- Stronger Gradients: Compared to the sigmoid function, the tanh function has steeper gradients for most of its domain, which can result in faster convergence during training.
Disadvantages of the Tanh Function
- Vanishing Gradient Problem: Like the sigmoid function, the tanh function also suffers from the vanishing gradient problem for very high or very low input values, where the gradient approaches zero. This can slow down the training of deep networks.
- Computationally Expensive: The tanh function involves exponentials, which can be computationally expensive compared to simpler functions like ReLU.
Applications
The tanh function is used in the following scenarios:
- Hidden Layers: Often used in hidden layers of neural networks due to its zero-centered output and stronger gradients compared to sigmoid.
- Recurrent Neural Networks (RNNs): Commonly used in RNNs and Long Short-Term Memory (LSTM) networks due to its properties that help in capturing the dependencies over time.
Example in Neural Networks
Let’s see how the tanh function can be used in a simple neural network.
Example Code
import numpy as np
# Define the tanh function
def tanh(x):
return np.tanh(x)
# Initialize parameters
def initialize_parameters(input_size, hidden_size, output_size):
W1 = np.random.randn(input_size, hidden_size) * np.sqrt(1 / input_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * np.sqrt(1 / hidden_size)
b2 = np.zeros((1, output_size))
return W1, b1, W2, b2
# Forward propagation using tanh activation
def forward_propagation(X, W1, b1, W2, b2):
Z1 = np.dot(X, W1) + b1
A1 = tanh(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = tanh(Z2)
return Z1, A1, Z2, A2
# Example usage
input_size = 3 # Number of input features
hidden_size = 4 # Number of neurons in the hidden layer
output_size = 1 # Number of output neurons
# Initialize parameters
W1, b1, W2, b2 = initialize_parameters(input_size, hidden_size, output_size)
# Input data (example)
X = np.array([[0, 0, 1],
[1, 1, 1],
[1, 0, 1],
[0, 1, 1]])
# Forward propagation
Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
# Print the outputs
print("Z1:", Z1)
print("A1:", A1)
print("Z2:", Z2)
print("A2:", A2)
Explanation of the Code
- Tanh Function: Defined to map any input xxx to a value between -1 and 1.
- Parameter Initialization: Weights and biases are initialized using Xavier initialization.
- Forward Propagation:
- Layer 1: Compute the weighted sum Z1Z1Z1 and apply the tanh activation function to get A1A1A1.
- Layer 2: Compute the weighted sum Z2Z2Z2 from the hidden layer activations A1A1A1 and apply the tanh activation function to get the final output A2A2A2.
Summary
The tanh function is a widely used activation function in neural networks due to its zero-centered output and stronger gradients compared to the sigmoid function. It is particularly useful in hidden layers and recurrent neural networks. Despite its advantages, it also suffers from the vanishing gradient problem for extreme input values. Understanding the properties and appropriate usage of the tanh function is crucial for effectively designing and training neural networks.