Data and Information
- Data are facts about people, places and things
- Information is defined as processed data
- Example - Table 1-1, pg. 2
- Information forms the foundation for decision making
- You cannot generate good information from bad data
- How and in what format data are stored in various databases will help determine how usable and useful the data will be
The Database and the Database Management System (DBMS)
- The database as electronic filing cabinet:
- Data, kept in electronic folders, are stored in the cabinet's drawers
- There is a special drawer with metadata (data about data) kept in a data dictionary
- The data dictionary is a reference of the data components, their names and their characteristics. It also contains the definition of the relationships between the data within the electronic folders and between the drawers (We'll get more into this later)
- Database contents are managed by a DBMS (Database Management System) A DBMS manages the metadata (updating as required), transforms input to match the data dictionary requirements, creates and enforces a secure environment, manages the multiuser environment, enforces data integrity.
- The DBMS makes databases practical.
- A relational database is a more sophisticated form which makes data easier to manage by visualizing a database as a series of tables where each row is a record and each column represents a field. (More on this later)
- A relational database is managed by an RDBMS (Guess what that stands for...)
- There are many different types of databases designed for different uses. The production database is most common and is very transaction-intensive. A decision support system (DSS) is another type and is more focussed on information rather than transactions.
- Data warehouses are huge decision support systems. (In the terabyte range)
The Information System
Database Building Blocks: Entities, Entity Sets, and Attributes
- Databases store information about entities
- An entity can be a person, place, thing or even a concept, anything you want to keep track of.
- An entity set is a named collection of related data
- Each entity has attributes, otherwise known as properties or characteristics
- Example Fig. 1.3, pg. 7
Database Tables and Their Components
- Entities are stored in a database table, so a database table is the same as an entity set. A table is a matrix composed of intersecting rows and columns.
- Each column is named for an attribute and contains the values for the attribute it represents
- Each row (also known as a tuple) is uniquely identified by an attribute known as the primary key.
- A primary key may be a single attribute or a a combination of attributes, as long as it is unique for each tuple or record.
- Any combination of attributes that could be the primary key but isn't is called the candidate key.
- Realistically, you want to use the minimum number of attributes for your key.
- A secondary key is not necessarily unique but can be used to narrow down your search if you don't know the primary key
- A foreign key in a table is an attribute that matches the primary key in another table. (Example Fig. 1.5, pg. 10)
Entity Integrity and Referential Integrity
- If all of the rows of a database are uniquely identified, then the table has entity integrity
- Please note that since a primary key must have a unique value, it can't be set to null (the absence of a value)
- Proper foreign key values in one table must either match primary key values in the related table or be null. If they meet this requirement, then they have referential integrity. In other words, they can't point to primary keys that don't exist.
Putting It All Together in a Small Database
Example pgs. 11-12
Documentation: Selecting and Naming the Attributes
- We put data in a database because sooner or later we want to get information from that data. So we need to select a sufficient number of useful attributes for each entity set.
- Documentation is crucial for database design. It should give us a precise definition and description of the database's components and the relationships between those components. It's a map of the system.
- One important piece of documentation is self-explanatory names for all system components. This includes table names as well as attributes.
- Tables that contain entity sets are usually named after the entities they contain.
- table column names consist of two parts: the prefix identifies the table and the second part identifies the attribute (field) stored in that column. Therefore, an attribute name whose prefix is not related to the name of the table in which it is located will always be a foreign key.
Attribute Characteristics
- A simple or atomic attribute cannot be further divisible into its components (ex. employee gender)
- An attribute that can be divided into further components is called a composite attribute.
- It is usually good practice to use simple attributes if possible.
- Single-valued attributes can have only one value (ex. birth date)
- Multivalued attributes can have multiple values (such as a listing of an employee's college degrees)
- Good database designs generally assure that single-valued attributes are stored in the database table by converting multi-valued attributes to single-valued. Unfortunately there are tons of *bad* ways to do this. (Ex. fig. 1.11, pg. 17)
Relationship Types
- 1:1 - one entity is related to only one other entity
- This is rare and can cause problems, generating a lot of nulls in a table
- Example Fig. 1.13, pg. 20
- One way to fix this is to store the unique attributes in one table and the commmon ones in another (example Fig. 1.14, pg. 21)
- 1:M (one to many) - when an entity in one table may be related to many other entities in another table. (Example: a customer generates a lot of invoices)
- The 1:M relationship is the relational database design standard.
- Example Fig. 1.15, pg. 21
- M:N - when an entity can occur more than once on both sides of the relationship
- This can cause redundant data problems (Fig. 1.16, pg. 22)
- We can use a composite table to help solve this problem.
- An optional relationship is when an entity occurrence on one side of a relationship doesn't require a corresponding entity occurence on the other side of the relationship. (Ex. an employee may have dependents but doesn't have to.)
- A mandatory relationship is when an entity occurrence on one side must have a corresponding entity relationship on the other side.
- A weak entity is existent dependent on another entity and it's primary key is at least partially derived from the related entity's primary key. (Example, a dependent cannot exist without an employee)
- A strong entity is not existent dependent on another entity (an employee doesn't have to have any dependents.)
- A recursive relationship exists if an entity can be related to itself. (example fig. 1.18, pg. 25)
Database Design: Creating a Blueprint with an ERD
- An ERD is an Entity Relationship Diagram and is a blueprint for database design
- The ERD uses symbols to indicate entity sets.
- ERD designers don't distinguish between entity sets and entities. (Table 1-2, pg. 27)
- ERDs are based on the following features:
- An entity name is always a noun
- All entity names are capitalized
- The ERD uses a diamond to represent a relationship and a line connects the relationship to the entity
- Alll relationship names are written in lowercase and must be active or passive verbs
- A relationship is always read from the 1 side to the M side.
- A circle written next to, above or below any ERD entity indicates an optional relationship also known as optionality
- A single character, known as connectivity, is written next to each entity to indicate its relationship type.
- Each entity is labeled to indicate its cardinalities
- Example ERD - Fig. 1-21, pg. 28
Business Rules
- The ERD's components are derived from business rules
- Business rules are short and concise statements that establish the existence and composition of entities, attributes, relationships and constraints. (Example, pg. 31)
- To translate business rules into database design, practice makes perfect.
- Example pgs. 32-35
Conflicting Database Design Goals
Keep in mind your database design must meet information, transaction speed, redundancy control, expandability and maintainability requirements.
Some of these may be in conflict and you have to resolve those conflicts.
You need professional judgment as well as skill.
Be aware of trade-offs you may be forced to make.
A Sample Database Design's Organization Database