Manav Goyal

Database - D B M S

Types of Databases
DB Optimization

Types of Databases

NoSQL Databases
Relational Databases
- Based on Relational Model
- Relational databases are quite popular, even though it was a system designed in the 1970s
- Also known as relational database management systems (RDBMS), relational databases commonly use Structured Query Language (SQL) for operations such as creating, reading, updating, and deleting data
- Relational databases store information in discrete tables, which can be JOINed together by fields known as foreign keys
- Eg: MySQL, Microsoft SQL Server, Oracle
- They are ubiquitous, highly optimised for working with structured data
- They provide a stronger guarantee of data normalisation
- Scalability issues (Horizontal Scaling)
- Data become huge, system become more complex
Object Oriented Databases
- The object-oriented data model, is based on the object-oriented-programming paradigm, which is now in wide use
- Inheritance, object-identity, and encapsulation (information hiding), with methods to provide an interface to objects, are among the key concepts of object-oriented programming that have found applications in data modelling
- The object-oriented data model also supports a rich type system, including structured and collection types
- While inheritance and, to some extent, complex types are also present in the E-R model, encapsulation and object-identity distinguish the object-oriented data model from the E-R model
- In Object-oriented databases data is treated as an object
- All bits of information come in one instantly available object package instead of multiple tables
- Eg: ObjectDB, GemStone etc
- Advantages
  - Data storage and retrieval is easy and quick
  - Can handle complex data relations and more variety of data types that standard relational databases
  - Relatively friendly to model the advance real world problems
  - Works with functionality of OOPs and Object Oriented languages
- Disadvantages
  - High complexity causes performance issues like read, write, update and delete operations are slowed down
  - Not much of a community support as isn’t widely adopted as relational databases
  - Does not support views like relational databases
Hierarchical Databases
- Based on a concrete hierarchy
- The schema for hierarchical databases is defined by its tree-like organisation, in which there is typically a root “parent” directory of data stored as records that links to various other subdirectory branches, and each subdirectory branch, or child record, may link to various other subdirectory branches
- Data within records is stored in the form of fields, and each field can only contain one value
- Retrieving hierarchical data from a hierarchical database architecture requires traversing the entire tree, starting at the root node
- Since the disk storage system is also inherently a hierarchical structure, these models can also be used as physical models
- The key advantage of a hierarchical database is its ease of use
  - The one-to-many organisation of data makes traversing the database simple and fast, which is ideal for use cases such as website drop-down menus or computer folders
  - Due to the separation of the tables from physical storage structures, information can easily be added or deleted without affecting the entirety of the database
  - And most major programming languages offer functionality for reading tree structure databases
- The major disadvantage of hierarchical databases is their inflexible nature
- The tree-like organisation of data requires top-to-bottom sequential searching, which is time consuming, and requires repetitive storage of data in multiple different entities, which can be redundant
- Eg: IBM IMS
Network Databases
- Extension of Hierarchical databases
- The child records are given the freedom to associate with multiple parent records
- Organised in a Graph structure
- Can handle complex relations
- Maintenance is tedious
- M:N links may cause slow retrieval
- Not much web community support
- Eg: Integrated Data Store (IDS), IDMS (Integrated Database Management System), Raima Database Manager, TurboIMAGE

DB Optimization

Distributed Database
- A single logical database that is, spread across multiple locations (servers) and logically interconnected by network
- This is the product of applying DB optimisation techniques like Clustering, Partitioning and Sharding
Clustering
- Database Clustering (making Replica-sets) is the process of combining more than one servers or instances connecting a single database
- Database clustering, SQL server clustering, and SQL clustering are closely associated with SQL is the language used to manage the database information, sometimes used to handle large traffic
- Replicate the same dataset on different servers
- In cluster architecture, all requests are split with many computers so that an individual user request is executed and produced by a number of computer systems
- The clustering is serviceable definitely by the ability of load balancing and high-availability
- If one node collapses, the request is handled by another node
- Consequently, there are few or no possibilities of absolute system failures
- Advantages
  - Data Redundancy
    - Clustering of databases helps with data redundancy, as we store the same data at multiple servers
    - The redundancy that clustering offers is required and is quite certain due to the synchronisation
    - In case any of the servers had to face a failure due to any possible reason, the data is available at other servers to access
  - Load balancing
    - Scalability doesn’t come by default with the database, brought by clustering
    - Basically, what load balancing does is allocating the workload among the different servers that are part of the cluster
      - This can provide scaling seamlessly as required
      - This links directly to high availability
  - High availability
    - High availability refers the amount of time a database is considered available
    - The amount of availability you need greatly depends on the number of transactions you are running on your database and how often you are running any kind of analytics on your data
Partitioning
- The technique used to divide stored database objects into separate servers
- Due to this, there is an increase in performance, controllability of the data
- When Horizontally scaling our machines/servers, dealing with relational databases becomes quite tough to maintain the relations
- If we apply partitioning to the database that is already scaled out i.e. equipped with multiple servers, we can partition our database among those servers and handle the big data easily
- Types
  - Vertical Partitioning
    - Slicing relation vertically / column-wise
    - Need to access different servers to get complete tuples
  - Horizontal Partitioning
    - Slicing relation horizontally / row-wise
    - Independent chunks of data tuples are stored in different servers
- When Partitioning is Applied?
  - Dataset become much huge that managing and dealing with it become a tedious task
  - The number of requests are enough larger that the single DB server access is taking huge time and hence the system’s response time become high
- Advantages of Partitioning
  - Parallelism, Availability, Performance, Manageability, Reduce Cost
Sharding
- Technique to implement Horizontal Partitioning
- Instead of having all the data sit on one DB instance, we split it up and introduce a Routing layer so that we can forward the request to the right instances that actually contain the data
- Pros
  - Scalability, Availability
- Cons
  - Complexity => Making partition mapping, Routing layer to be implemented in the system, Non-uniformity that creates the necessity of Re-Sharding
  - Scatter-Gather problem => Not well suited for Analytical type of queries, as the data is spread across different DB instances