Database Normalization
Database Normalization
Rating:
A database plays a very important role in any business. All organizations, whether big or small, keep records of their business activities. A database is a systematic representation of data. Data is organized in the form of rows and columns so that it can be searched and accessed easily.
Managing data is a very important and complex task, as data plays a vital role in taking decisions. For example, a company makes decisions regarding their future sales and production based upon previous years data. If the data for previous years is not well organized and correct, then it will be difficult for the company to take any decision about future sales and production.
Before starting the discussion about normalization, it is required to learn why we do normalization? What is the need of normalizing a database? To answer all these questions, have a look at the table given below:

The following table has a number of complications. In this table, if we want to add a new course, then it cannot be inserted until a student enrolls in the course. This indicates insertion anomaly.
Suppose Carry leaves the course, then the related data, i.e. Professor name and Course, will also be deleted, but this is not the actual case. A course will remain in the college no matter how many students have left. Suppose Mr. Joseph starts teaching English instead of Economics, then all the rows will be affected. This indicates update anomaly. In the above-mentioned example, there is only one row where update is required, but think of a table that has thousands of records; it will prove to be a herculean task to change all the records. In order to remove all these anomalies from the database, we need to normalize the database.
Database normalization is a vital aspect of data modeling. There are various advantages of normalization. These are listed below:
- It reduces data redundancy, i.e. there are no duplicate values in the database. Duplicate values slow down the searching process, requires more space in memory, and above all, results in anomalies such as update. If data is changed in one place, then the duplicate copies of it must be changed to avoid data inconsistency problem.
- It is a reversible process. It splits the table into one or more tables, but this decomposition is loss less, i.e. if we perform a join on these tables, we get the original table and no unauthentic data is introduced.
- It reduces deletion and insertion anomalies.
Before understanding normalization, the term functional dependency should be well understood.
According to functional dependency, the value of an attribute can be determined by another attribute. The attribute that is dependent on another attribute is known as dependent, and the attribute that determines another attribute is known as determinant. B is functionally dependent on A, and A functionally determines B if A uniquely determines B.
The above-mentioned statement must be true for functional dependency to exist between two attributes.
There are mainly five types of normal forms. Three of them are discussed in the next section:
The first normal form states that all the rows must have atomic values, i.e. no row can have more than one value. The table given below is in 1NF, as each column contains a single value in each row.

According to the second normal form, the table should be in first normal form and all the non-key attributes must be functionally dependent on the primary key attribute. If there is a composite key attribute, then all the non-key attributes must be functionally dependent on the whole. For example, if there are three attributes: A, B, and C, and AB forms composite key, then C must be dependent on both, i.e. AB ->C.
The second normal form is violated if an attribute that is not the key attribute functionally determines other attributes in the table. Let's take an example for a better understanding.
A table named Student is given below:
Student
| StID Name Address Picture Format Byte Size Dimensions |
Here, StID is the primary key attribute. The Picture attribute is not the key attribute; however, Format, Byte, Dimensions attributes are dependent on Picture, and hence this is a violation of 2NF. To make this in 2NF, split the table in two tables, Student and Picture. All the attributes that depend on the Picture attribute should be moved to the Picture table and should be removed from the original table.
Student
| Picture
|
According to the third normal form, all the non-key attribute must be functionally dependent on the primary key and nothing else, and there should be no transitivity, i.e. if A, B, and C are three attributes, given that B is functionally dependent on A, C is functionally dependent on B, then C is also functionally dependent on A. This type of dependency is known as transitive dependency.
In other words, C is indirectly dependent on A. For a better understanding of 3NF, let's take an example. Below is a table named Employee:
| A101 | Smith | Marketing Executive |
| A102 | Roger | Finance Officer |
| A103 | Andrew | Sales Executive |
In this table, EmpID is the primary key. The Name attribute is functionally dependent on EmpID, and Designation is dependent on Name; therefore, Designation is also indirectly dependent on EmpID. To make this table in 3NF, split the table into two tables Employee and Designation as given below:
Employee
| Designation
|
Rating:
Other articles
- Skills required for CIW 1D0-441
- What are the components of the Relational Model?
- What are table aliases?
- What is a SQLException?
- What is an entity?
