Mastering Database Management

0 of 79 lessons complete (0%)

Data Modeling

Normalization and denormalization

You don’t have access to this lesson

Please register or sign in to access the course content.

Normalization:

Normalization is a database design technique used to organize tables and reduce data redundancy in a relational database. The goal is to eliminate data anomalies (insert, update, and delete anomalies) and ensure that data is stored efficiently. Normalization is typically carried out through a series of normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), each addressing specific issues.

Why Normalization?

Following table has some issues which are not good for any DB, like the branch and branch id. Now Imagine if any day the Administration wish to change the Branch ID then all the records in the table needs to be changed.

Roll No.NameAgeBranchBranch ID
1Anmol24CSE101
2Ansh23CSE101
3Akshay25CSE101
4Bhuvnesh26CSE102
Above table has anamolies and need to be Normalized

Let’s understand the Anamolies in the Database First before moving to the Normalization

  • If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • In the above example if the branch code 101 is updated to 102, in that case we need to update all the rows.
  • We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.
  • In the above slide example, when we delete all the record of students automatically the data of branch name and branch id is getting deleted, which is not required to delete. We are deleting student information because of which branch information is also being deleted.

An Insert Anomaly occurs when certain attributes cannot be inserted into the database without the presence of other attributes. For example this is the converse of delete anomaly – we can’t add a new course unless we have at least one student enrolled on the course.


Let’s Understant all the Normal Forms

Normalized Form is below


Normal Forms

  • Ensures that each column in a table contains atomic (indivisible) values.
  • Eliminates repeating groups and arrays.
  • Example: Splitting a “Books” table with a “Genres” column into two tables, one for books and another for genres.

Rule

First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

  • Extends 1NF and ensures that all non-key attributes are fully functionally dependent on the primary key.
  • Eliminates partial dependencies.
  • Example: Breaking a table with a composite primary key into separate tables.

Rule

A relation is in second normal form if it is in 1NF and every non key attribute is fully functionally dependent on the primary key.

But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

  • Extends 2NF and ensures that there are no transitive dependencies.
  • Eliminates non-prime attributes that depend on other non-prime attributes.
  • Example: Removing columns that depend on other non-key attributes.

Rule

  • It must be in Second Normal form
  • No non key attribute is transitively dependent on the primary key.

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

  • A stricter form of 3NF, where every non-trivial functional dependency is a superkey.
  • Ensures that there are no non-trivial dependencies on candidate keys.
  • Example: Ensuring that there are no non-prime attributes dependent on superkeys.

Rule:

Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −

For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key to the relation ZipCodes. So,

 Stu_ID → Stu_Name, Zip

 and

 Zip → City Which confirms that both the relations are in BCNF.

Denormalization:

Denormalization is the process of combining tables that have been normalized. While normalization reduces redundancy and improves data integrity, it can also lead to increased complexity in queries and potentially slower performance for read-heavy workloads. Denormalization aims to optimize query performance by reintroducing redundancy and reducing the number of joins needed for common queries.

Key Concepts:

  1. Increased Query Performance:
    • Reduces the need for complex joins, leading to faster query performance.
    • Useful in read-heavy scenarios, where optimizing for query speed is a priority.
  2. Data Redundancy:
    • Reintroduces redundancy in the form of duplicated data.
    • Can lead to potential data anomalies during updates, inserts, and deletes.
  3. Simplified Queries:
    • Simplifies queries by reducing the number of tables involved.
    • Useful for scenarios where the database is more read-centric than write-centric.
  4. Materialized Views:
    • Denormalization can involve creating materialized views that store precomputed results of complex queries, further optimizing read operations.

Choosing Between Normalization and Denormalization:

  • Use Normalization When:
    • The database is write-intensive.
    • Data integrity is of utmost importance.
    • The database design needs to be flexible and adaptable.
  • Use Denormalization When:
    • The database is read-intensive, and query performance is a priority.
    • There is a clear understanding of the read patterns and queries.
    • The trade-off between redundancy and query speed is acceptable.

Conclusion:

Normalization and denormalization are complementary techniques that should be used based on the specific needs of a database. A balanced approach, often referred to as “controlled denormalization,” involves selectively denormalizing certain parts of the database to optimize for specific queries while maintaining overall data integrity. The choice between normalization and denormalization depends on the specific requirements and usage patterns of the application.