Skip to content
Nishant Munjal
  • About
  • NMRIL Labs
  • Courses
  • ResearchExpand
    • Research Publications
    • Books
    • Patents
    • Ph.D. Supervised
  • Workshop/Conferences
  • ToolsExpand
    • Creative Image Converter
    • Creative QRCode Generator
    • Creative QR Code Generator Tool
    • EMI Calculator
    • SIP Calculator
  • Blog
  • Resume
One Page CV
Nishant Munjal
Artificial Intelligence

What is an Activation Function?

An activation function is a mathematical function applied to the output of each neuron in a neural network. It determines whether a neuron should be activated or not based on its input. Activation functions introduce non-linearity into the network, allowing it to model complex patterns and interactions in the data.

Why Do We Need Activation Functions?

  1. Introducing Non-linearity:
    • Real-world data is often non-linear, and to model such complex relationships, non-linear activation functions are essential. Without them, a neural network would essentially be a linear regression model, no matter how many layers it has.
  2. Enabling Deep Learning:
    • Activation functions enable neural networks to stack multiple layers, making them deep. Each layer can learn different levels of abstraction thanks to the non-linearity introduced by the activation function.
  3. Controlling Neuron Output:
    • Activation functions help in controlling the output of neurons, ensuring that they fall within a certain range (e.g., between 0 and 1 for sigmoid).

Common Activation Functions

  1. Sigmoid Function:
    • Formula: σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}σ(x)=1+e−x1​
    • Range: (0, 1)
    • Used in: Output layers for binary classification problems.
    • Pros: Smooth gradient, output values bound between 0 and 1.
    • Cons: Vanishing gradient problem, outputs not zero-centered.
  2. Hyperbolic Tangent (Tanh) Function:
    • Formula: tanh⁡(x)=ex−e−xex+e−x\tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}tanh(x)=ex+e−xex−e−x​
    • Range: (-1, 1)
    • Used in: Hidden layers.
    • Pros: Zero-centered output, stronger gradients than sigmoid.
    • Cons: Vanishing gradient problem.
  3. ReLU (Rectified Linear Unit):
    • Formula: ReLU(x)=max⁡(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)
    • Range: [0, ∞)
    • Used in: Hidden layers.
    • Pros: Computationally efficient, mitigates vanishing gradient problem, sparsity (many neurons are deactivated).
    • Cons: Can cause dead neurons if many neurons output zero (dying ReLU problem).
  4. Leaky ReLU:
    • Formula: Leaky ReLU(x)=max⁡(0.01x,x)\text{Leaky ReLU}(x) = \max(0.01x, x)Leaky ReLU(x)=max(0.01x,x)
    • Range: (-∞, ∞)
    • Used in: Hidden layers.
    • Pros: Addresses dying ReLU problem by allowing a small gradient when the unit is not active.
    • Cons: Introduces a small slope which might not always be beneficial.
  5. Softmax Function:
    • Formula: Softmax(xi)=exi∑jexj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}Softmax(xi​)=∑j​exj​exi​​
    • Range: (0, 1), with all outputs summing to 1.
    • Used in: Output layers for multi-class classification problems.
    • Pros: Converts logits to probabilities, useful for multi-class classification.
    • Cons: Can be computationally expensive for a large number of classes.

Example of Using Activation Functions

Here’s an example of a simple neural network with ReLU and sigmoid activation functions:

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return np.maximum(0, x)

def initialize_parameters(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size) * np.sqrt(1 / input_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size) * np.sqrt(1 / hidden_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)  # Apply ReLU activation function
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)  # Apply Sigmoid activation function
    return Z1, A1, Z2, A2

# Define the neural network structure
input_size = 3  # Number of input features
hidden_size = 4  # Number of neurons in the hidden layer
output_size = 1  # Number of output neurons

# Initialize parameters
W1, b1, W2, b2 = initialize_parameters(input_size, hidden_size, output_size)

# Input data (example)
X = np.array([[0, 0, 1],
              [1, 1, 1],
              [1, 0, 1],
              [0, 1, 1]])

# Forward propagation
Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)

# Print the outputs
print("Z1:", Z1)
print("A1:", A1)
print("Z2:", Z2)
print("A2:", A2)

Summary

Activation functions are vital in neural networks for introducing non-linearity, which allows the network to model complex patterns. They control the output of neurons and enable the stacking of multiple layers, making deep learning possible. Different activation functions are used in various parts of the network depending on the specific requirements and characteristics of the data.

Post navigation

Previous Previous
What are Biases in Neural Networks?
NextContinue
Sigmoid Function in Neural Network
  • Latest

    What is CSP Bypass?

    What is CSP Bypass? CSP (Content Security Policy) is a browser security feature that tries to stop attacks like XSS (Cross-Site Scripting) by controlling what…

    Read More What is CSP Bypass?Continue

  • Cybersecurity

    Cybersecurity Tools

    Cybersecurity Tools 1️⃣ Digital Forensics Tools Tool Name Type Used For Who Uses Autopsy Computer forensics Analyze hard disk, recover deleted files Investigators FTK Imager…

    Read More Cybersecurity ToolsContinue

  • Cybersecurity

    Command Injection Attack

    Command Injection Attack A Command Injection attack happens when a web application takes user input and passes it to the system shell (Linux/Windows command line)…

    Read More Command Injection AttackContinue

  • Cybersecurity

    CSRF in DVWA (for learning/demo)

    CSRF in DVWA (for learning/demo) CSRF: An attack where a logged-in user is tricked into sending unwanted requests to a web application, causing actions to…

    Read More CSRF in DVWA (for learning/demo)Continue

  • Cybersecurity

    XSS = Cross-Site Scripting using DVWA

    XSS = Cross-Site Scripting using DVWA It allows an attacker to inject JavaScript into a web page so that it runs in another user’s browser….

    Read More XSS = Cross-Site Scripting using DVWAContinue

Nishant Munjal

Coding Humanity’s Future </>


Facebook Twitter Linkedin YouTube Github Email

Tools

  • SIP Calculator
  • EMI Calculator
  • Creative QR Code
  • Image Converter

Resources

  • Blog
  • Contact
  • Refund and Returns

Legal

  • Disclaimer
  • Privacy Policy
  • Terms and Conditions

© 2026 Nishant Munjal - All Rights Reserved

  • About
  • NMRIL Labs
  • Courses
  • Research
    • Research Publications
    • Books
    • Patents
    • Ph.D. Supervised
  • Workshop/Conferences
  • Tools
    • Creative Image Converter
    • Creative QRCode Generator
    • Creative QR Code Generator Tool
    • EMI Calculator
    • SIP Calculator
  • Blog
  • Resume
Download CV
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.