Manav Goyal

Data Modeling - D B M S

Entity-Relationship (ER) Model
Relational Model
Transform ER Model to Relational Model (ER-Diagram to Tables)

Entity-Relationship (ER) Model

It is a high level data model based on a perception of a real world that consists of a collection of basic objects, called entities and of relationships among these objects
Graphical representation of ER Model is ER diagram, which acts as a blueprint of DB
Entity
- An Entity is a “thing” or “object” in the real world that is distinguishable from all other objects
- It has physical existence
- Entity can be uniquely identified (By a primary attribute, aka Primary Key)
- Types
  - Strong Entity
    - Can be uniquely identified
    - Represented by Square
  - Weak Entity
    - Can’t be uniquely identified, Depends on some other strong entity
    - It doesn’t have sufficient attributes, to select a uniquely identifiable attribute
    - Weak entity depends on strong entity for existence
    - Represented by Double Square
- Entity set
  - It is a set of entities of the same type that share the same properties, or attributes
Attribute
- An entity is represented by a set of attributes
- Each entity has a value for each of its attributes
- For each attribute, there is a set of permitted values, called the domain/value set
- Represented by Eclipse
- Types of Attributes
  - Simple
    - Attributes which can’t be divided further
  - Composite
    - Can be divided into subparts (that is, other attributes)
    - If user wants to refer to an entire attribute or to only a component of the attribute
  - Single-valued
    - Only one value attribute
    - Represented by Single Eclipse
  - Multi-valued
    - Attribute having more than one value
    - Limit constraint may be applied, upper or lower limits
    - Represented by Double Eclipse
  - Derived
    - Value of this type of attribute can be derived from the value of other related attributes
    - Represented by Dotted Eclipse
  - NULL Value
    - An attribute takes a null value when an entity does not have a value for it
    - It may indicate "not applicable", value doesn’t exist
      - Example => Person having no middle-name
    - It may indicate "unknown"
      - Unknown can indicate missing entry
        
        Example => Name value of a customer is NULL, means it is missing as name must have some value
      - Not known
  - Stored
    - Which can't be derived from other Attributes
  - Key
    - Having unique value
    - Represented by Circle with Underline
  - Required & Optimal
    - If the value is Required or not
  - Complex
    - Composite + Multi-valued
Relationship
- Association among two or more entities
- Attribute of Relation is called Descriptive Attribute other than the 2 Foreign keys referencing to Primary Key of each table
- Types
  - Strong Relationship
    - Between two independent entities
    - Represented by Diamond
  - Weak Relationship
    - Between weak entity and its owner/strong entity
    - Represented by Double-Diamond
- Degree of Relationship => Number of entities participating in a relationship
  - Unary, Only one entity participates
    - Example => Employee manages employee
  - Binary, two entities participates
    - Binary are common
    - Example => Student takes Course
  - Ternary relationship, three entities participates
    - Example => Employee works-on branch, employee works-on job
Relationships Constraints
- Mapping Cardinality/Cardinality Ratio
  - Number of entities to which another entity can be associated via a relationship
  - Types
    - One to One (1:1)
      - Entity in A associates with at most one entity in B, where A & B are entity sets. And an entity of B is associated with at most one entity of A
      - One Value should be taken from both Entity => Primary Key can be from any of 2
      - Can be Reduced to 2 Tables
    - One to Many (1:M)
      - Entity in A associated with N entity in B. While entity in B is associated with at most one entity in A
      - Primary Key in Relationship table will be from the "many" side
      - Can be merged with the "many" table
    - Many to One (M:1)
      - Entity in A associated with at most one entity in B. While entity in B can be associated with N entity in A
      - Primary Key will be from the "many" side
      - Can be merged with the "many" table
    - Many to Many (M:N)
      - Entity in A associated with N entity in B. While entity in B also associated with N entity in A
      - Composite Key will be formed
      - Can't be reduced
- Participation Constraints (Minimum cardinality constraint)
  - Types
    - Partial Participation
      - Not all entities are involved in the relationship instance
      - Represented by Single-line
    - Total Participation
      - Each entity must be involved in at least one relationship instance
      - Represented by Double-line
  - Weak entity has total participation constraint, but strong may not have total
Specialisation
- Specialisation is splitting up the entity set into further sub entity sets on the basis of their functionalities, specialties and features
- It is a Top-Down approach
- Eg: Person entity set can be divided into customer, student, employee. Person is superclass and other specialised entity sets are subclasses
  - We have "is-a" relationship between superclass and subclass
  - Represented by triangle component
- Why Specialisation?
  - Certain attributes may only be applicable to a few entities of the parent entity set
  - DB designer can show the distinctive features of the sub entities
  - To group such entities we apply Specialisation, to overall refine the DB blueprint
Generalisation
- It is just a reverse of Specialisation
- DB Designer, may encounter certain properties of two entities are overlapping. Designer may consider to make a new generalised entity set. That generalised entity set will be a super class.
- "is-a" relationship is present between subclass and super class
- Eg: Car, Jeep and Bus all have some common attributes, to avoid data repetition for the common attributes. DB designer may consider to Generalise to a new entity set "Vehicle"
- It is a Bottom-up approach
- Why Generalisation?
  - Makes DB more refined and simpler
  - Common attributes are not repeated
Attribute Inheritance
- Both Specialisation and Generalisation, has attribute inheritance
- The attributes of higher level entity sets are inherited by lower level entity sets
- Eg: Customer & Employee inherit the attributes of Person
Participation Inheritance
- If a parent entity set participates in a relationship then its child entity sets will also participate in that relationship
Aggregation
- To show relationships among relationships
- Abstraction is applied to treat relationships as higher-level entities. We can call it Abstract entity
- Avoid redundancy by aggregating relationship as an entity set itself
Steps to make ER diagram
- Requirement engineering
- Identify identity sets
- Identify attributes & its types
- Identify relationships & its constraints (Mapping, Participation)

Relational Model

Relational Model (RM) organizes the data in the form of relations (tables)
A relational DB consists of collection of tables, each of which is assigned a unique name
Row => Represents a relationship among a set of values, and table is collection of such relationships
Tuple => A single row of the table representing a single data point / a unique record
Columns => Represents the attributes of the relation
- Each attribute, there is a permitted value, called domain of the attribute
Relation Schema => Defines the design and structure of the relation, contains the name of the relation and all the columns/attributes
- Eg: TableName (Column1, Column2, Column3)
Implementation (Database creation)
- RDBMS is a software that implements RM
- Eg: Oracle, IBM, MySQL, MS Access
Degree of table => Number of attributes/columns in a given table/relation
Cardinality => Total no. of tuples in a given relation
Relational Key => Set of attributes which can uniquely identify an each tuple
Important properties of a Table in Relational Model
- The name of relation is distinct among all other relation
- The values have to be atomic. Can’t be broken down further
- The name of each attribute/column must be unique
- Each tuple must be unique in a table
- The sequence of row and column has no significance
- Tables must follow integrity constraints => Helps to maintain data consistency across the tables
Relational Model Keys
- Super Key (SK)
  - Any Permutation & Combination of attributes present in a table which can uniquely identify each tuple
- Candidate Key (CK)
  - Minimum subset of super keys, which can uniquely identify each tuple
  - It contains no redundant attribute
  - CK value shouldn’t be NULL
- Primary Key (PK)
  - Selected out of CK set, has the least no. of attributes
- Alternate Key (AK)
  - All CK except PK
- Foreign Key (FK)
  - It creates relation between two tables
  - FK helps to cross reference between two different relations
  - A relation, say r1 (Child table/Referencing relation), may include among its attributes the PK of an other relation, say r2 (Parent table/Referenced relation)
    - This attribute is called FK from r1 referencing r2
- Composite Key
  - PK formed using at least 2 attributes
- Compound Key
  - PK which is formed using 2 FK
- Surrogate Key
  - Synthetic PK
  - Generated automatically by DB, usually an integer value
  - May be used as PK
Integrity Constraints
- CRUD Operations must be done with some integrity policy so that DB is always consistent
- Introduced so that we do not accidentally corrupt the DB
- Domain Constraints
  - Restricts the value in the attribute of relation, specifies the Domain
  - Restrict the Data types of every attribute
- Entity Constraints
  - Every relation should have PK & PK != NULL
- Referential Constraints
  - Specified between two relations & helps maintain consistency among tuples of two relations
  - FK must have the matching PK for its each value in the parent table or it must be NULL
  - Insert constraint
    - Value can't be inserted in the child table if the value is not present in the parent table
  - Delete constraint
    - Value can't be deleted in the parent table if the value is present in the child table
    - On Delete Cascade
      - When deleting value from the parent table, delete corresponding entry from the child table
    - On Delete Null
      - When deleting value from the parent table, value of foreign key of corresponding entry from the child table becomes null
- Key Constraints
  - NOT NULL => This constraint will restrict the user from not having a NULL value
    - It ensures that every element in the database has a value
  - UNIQUE => It helps us to ensure that all the values consisting in a column are different from each other
  - DEFAULT => It is used to set the default value to the column
    - The default value is added to the columns if no value is specified for them
  - CHECK => It keeps the check that integrity of data is maintained before and after the completion of the CRUD
    - Limits value range (Domain)
  - PRIMARY KEY => This is an attribute or set of attributes that can uniquely identify each entity in the entity set
    - The primary key must contain unique as well as not null values
  - FOREIGN KEY => Common attribute that defines some relationship between two entities must be the primary key of an entity set and will become the foreign key of another entity set
    - This key will prevent every action which can result in loss of connection between tables

Transform ER Model to Relational Model (ER-Diagram to Tables)

Strong Entity
- Becomes an individual table with entity name, attributes becomes columns of the relation
- Entity’s Primary Key (PK) is used as Relation’s PK
- FK are added to establish relationships with other relations
Weak Entity
- A table is formed with all the attributes of the entity
- PK of its corresponding Strong Entity will be added as FK
- PK of the relation will be a composite PK, {FK + Partial discriminator Key}
Single Values Attributes
- Represented as columns directly in the tables/relations
Composite Attributes
- Handled by creating a separate attribute itself in the original relation for each composite attribute
Multivalued Attributes
- New tables (named as original attribute name) are created for each multivalued attribute
- PK of the entity is used as column FK in the new table
- Multivalued attribute’s similar name is added as a column to define multiple values
- PK of the new table would be {FK + multivalued name}
Derived Attributes => Not considered in the tables
Generalisation
- Method-1
  - Create a table for the higher level entity set
  - For each lower-level entity set, create a table that includes a column for each of the attributes of that entity set plus a column for each attribute of the primary key of the higher-level entity set
- Method-2
  - If the generalisation is disjoint and complete—that is, if no entity is a member of two lower-level entity sets directly below a higher-level entity set, and if every entity in the higher level entity set is also a member of one of the lower-level entity sets. Here, do not create a table for the higher-level entity set
  - Instead, for each lower-level entity set, create a table that includes a column for each of the attributes of that entity set plus a column for each attribute of the higher-level entity sets
  - Drawbacks
    - If the second method were used for an overlapping generalisation, some values would be stored twice unnecessarily
    - Similarly, if the generalisation were not complete, if some accounts were neither savings nor current accounts, then such accounts could not be represented with the second method
Aggregation
- Table of the relationship set is made
- Attributes includes primary keys of entity set and aggregation set’s entities
- Also, add descriptive attribute if any on the relationship