CS224 Chapter 2 - Normalizing the Database Table Structure
- ❑ Dependencies
- To understand normalization, you have to understand dependencies
- A dependency is when the value of one attribute is dependent on the value of another attribute.
- So if X is dependent on Y, then the relationship is Y -> X (value of Y determines X, in other words)
- If Z is also dependent on Y, then the relationship is Y -> X, Z
- Example with customer data - pg. 52
- A dependency based on only part of the PK is known as a partial dependency
- A dependency based on an attribute that is not part of the PK is known as a transitive dependency
- ❑ Data Redundancy
- Partial and transitive dependencies should be avoided because they indicate the presence of data redundancies.
- A data redundancy exists when multiple values of an attribute exist within a table and the multiple values are not required to establish the relationships between tables
- Thus, multiple FK values are not redundant since the FK values are required to implement the relationships between related tables
- Example pg. 53
- The database storage standard requires that, except for foreign key and replicated data requirements, data must occur in only one place at one time. Any departure from this is known as an anomaly.
- In a data anomaly, having to update the same field in more than one table is known as an update anomaly.
- Data anomalies also cause management and information penalties
- ❑ The Normal Forms
- A table's normal form is determined by the type(s) of dependencies it may exhibit.
- The lowest implementable normal form is labeled 1NF or first normal form
- ❑ A database table is in 1NF if the following conditions are met:
- The PK entity integrity requirements are met
- Each row/column intersection can contain one and only one value.
- All of the table's attributes are dependent on the PK
- Example Fig. 2.2, pg. 52
- ❑ A table is in second normal form or 2NF, if:
- All 1NF conditions are met
- The partial dependencies have been removed
- ❑ A table is in third normal form (or 3NF), if:
- it meets all the 2NF conditions
- it contains no transitive dependencies
- Dependency diagrams are structures that display all dependencies and their sources (Example fig. 2.3, pg. 55)
- The normalization process is also known as decomposition, since the initial dependency diagram will be broken up to form a new set of tables
- ❑ To change any 1NF structure to higher-level normal forms:
- Identify the PK in the initial dependency diagram (fig. 2.4, pg. 56)
- Using the basic template developed in the previous step, write the dependent attributes next to each PK you identified (fig. 2.5, pg. 56)
- Break out all transitive dependencies to become new table structures (Fig. 2.6 pg. 57)
- After normalizing the tables, it is appropriate to examine whether or not the attributes meet all the information requirements. If so, we revise the ERD appropriately.
- ❑ The Quest for Higher Normal Forms
- Although higher level normal forms are preferred, sometimes 2NF is better than 3NF for performance reasons
- Data warehouses, however, require 2NF structures
- The Boyce-Codd normal form or BCNF, is a special case of the 3NF.
- Remember a table is in 3NF when it meets the 2NF requirements and it contains no transitive dependencies
- But a transitive dependency exists only when a non-key attribute determines another non-key attribute (a non-key attribute is one that is neither a PK nor a part of a PK)
- It is possible, however, for a non-key attribute to be the determinant of a PK or part of a PK without violating the 3NF requirements. (Fig. 2.8, pg. 59)
- ❑ Normalization Limitations: the ERD's Role
- You should not rely on normalization alone to produce good database design.
- Sometimes a design that is 3NF does not meet design requirements (example Table 2-1, pg. 59)