Skip to content
Nishant Munjal
  • About
  • NMRIL Labs
  • Courses
  • ResearchExpand
    • Research Publications
    • Books
    • Patents
    • Ph.D. Supervised
  • Workshop/Conferences
  • ToolsExpand
    • Creative Image Converter
    • Creative QRCode Generator
    • Creative QR Code Generator Tool
    • EMI Calculator
    • SIP Calculator
  • Blog
  • Resume
One Page CV
Nishant Munjal
Artificial Intelligence

Sigmoid Function in Neural Network

The sigmoid function is one of the most commonly used activation functions in neural networks, especially in binary classification tasks. It maps any real-valued number into a value between 0 and 1, making it suitable for models that need to output probabilities.

Mathematical Definition

The sigmoid function, also known as the logistic function, is defined mathematically as:

σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}σ(x)=1+e−x1​

Where eee is the base of the natural logarithm (approximately equal to 2.71828).

Characteristics of the Sigmoid Function

  1. Range: The output values of the sigmoid function lie between 0 and 1.
  2. Shape: It has an S-shaped (sigmoid) curve.
  3. Monotonicity: It is a monotonic function, meaning it is always increasing.
  4. Derivative: The derivative of the sigmoid function is given by:

σ′(x)=σ(x)(1−σ(x))\sigma'(x) = \sigma(x) (1 – \sigma(x))σ′(x)=σ(x)(1−σ(x))

This property is useful during the backpropagation process in training neural networks.

Advantages of the Sigmoid Function

  1. Smooth Gradient: The smooth gradient helps in gradient-based optimization algorithms.
  2. Output Range: Since the output range is (0, 1), it is especially useful for models that need to predict probabilities.
  3. Biological Plausibility: The sigmoid function is inspired by biological neurons which also exhibit a similar activation behavior.

Disadvantages of the Sigmoid Function

  1. Vanishing Gradient Problem: For very high or very low values of input xxx, the gradient (derivative) of the sigmoid function approaches zero. This can slow down or even stop the training of the neural network, especially in deeper networks.
  2. Outputs Not Zero-Centered: The outputs are always positive, which can make the gradient updates inefficient as it could result in a zigzagging effect during gradient descent.
  3. Computationally Expensive: The exponential function e−xe^{-x}e−x is computationally more expensive compared to some other activation functions like ReLU.

Applications

The sigmoid function is primarily used in the following scenarios:

  1. Output Layer for Binary Classification: When the neural network needs to output a probability for a binary classification task.
  2. Logistic Regression: It is the activation function used in logistic regression.
  3. Hidden Layers (Historical Use): Historically, the sigmoid function was also used in hidden layers, but this has been largely replaced by ReLU and its variants due to the vanishing gradient problem.

Example in Neural Networks

Let’s see how the sigmoid function can be used in a simple neural network.

Example Code

import numpy as np

# Define the sigmoid function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Initialize parameters
def initialize_parameters(input_size, hidden_size, output_size):
    W1 = np.random.randn(input_size, hidden_size) * np.sqrt(1 / input_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, output_size) * np.sqrt(1 / hidden_size)
    b2 = np.zeros((1, output_size))
    return W1, b1, W2, b2

# Forward propagation using sigmoid activation
def forward_propagation(X, W1, b1, W2, b2):
    Z1 = np.dot(X, W1) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

# Example usage
input_size = 3  # Number of input features
hidden_size = 4  # Number of neurons in the hidden layer
output_size = 1  # Number of output neurons

# Initialize parameters
W1, b1, W2, b2 = initialize_parameters(input_size, hidden_size, output_size)

# Input data (example)
X = np.array([[0, 0, 1],
              [1, 1, 1],
              [1, 0, 1],
              [0, 1, 1]])

# Forward propagation
Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)

# Print the outputs
print("Z1:", Z1)
print("A1:", A1)
print("Z2:", Z2)
print("A2:", A2)

Explanation of the Code

  1. Sigmoid Function: Defined to map any input xxx to a value between 0 and 1.
  2. Parameter Initialization: Weights and biases are initialized using Xavier initialization.
  3. Forward Propagation:
    • Layer 1: Compute the weighted sum Z1Z1Z1 and apply the sigmoid activation function to get A1A1A1.
    • Layer 2: Compute the weighted sum Z2Z2Z2 from the hidden layer activations A1A1A1 and apply the sigmoid activation function to get the final output A2A2A2.

Summary

The sigmoid function is a widely used activation function in neural networks due to its ability to map inputs to a range between 0 and 1, making it particularly useful for binary classification problems. Despite its historical significance and advantages, it has limitations like the vanishing gradient problem, leading to the adoption of alternative activation functions like ReLU in hidden layers.

Post navigation

Previous Previous
What is an Activation Function?
NextContinue
Why Initialize Weights in Neural Network
  • Cybersecurity

    Command Injection Attack

    Command Injection Attack A Command Injection attack happens when a web application takes user input and passes it to the system shell (Linux/Windows command line)…

    Read More Command Injection AttackContinue

  • Cybersecurity

    CSRF in DVWA (for learning/demo)

    CSRF in DVWA (for learning/demo) CSRF: An attack where a logged-in user is tricked into sending unwanted requests to a web application, causing actions to…

    Read More CSRF in DVWA (for learning/demo)Continue

  • Cybersecurity

    XSS = Cross-Site Scripting using DVWA

    XSS = Cross-Site Scripting using DVWA It allows an attacker to inject JavaScript into a web page so that it runs in another user’s browser….

    Read More XSS = Cross-Site Scripting using DVWAContinue

  • Cybersecurity

    SQL Injection using DVWA

    SQL Injection using DVWA SQL injection happens when user input is directly inserted into SQL query without validation. Here are the steps to practical on…

    Read More SQL Injection using DVWAContinue

  • Artificial Intelligence

    The Future AI Internet: Lessons from Moltbook

    The Future AI Internet: Lessons from Moltbook The internet is entering a new phase. For decades it was a web of pages, then it became…

    Read More The Future AI Internet: Lessons from MoltbookContinue

Nishant Munjal

Coding Humanity’s Future </>


Facebook Twitter Linkedin YouTube Github Email

Tools

  • SIP Calculator
  • EMI Calculator
  • Creative QR Code
  • Image Converter

Resources

  • Blog
  • Contact
  • Refund and Returns

Legal

  • Disclaimer
  • Privacy Policy
  • Terms and Conditions

© 2026 Nishant Munjal - All Rights Reserved

  • About
  • NMRIL Labs
  • Courses
  • Research
    • Research Publications
    • Books
    • Patents
    • Ph.D. Supervised
  • Workshop/Conferences
  • Tools
    • Creative Image Converter
    • Creative QRCode Generator
    • Creative QR Code Generator Tool
    • EMI Calculator
    • SIP Calculator
  • Blog
  • Resume
Download CV
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.