Manav Goyal

Introduction - D B M S

Data
- Data is a collection of raw, unorganized facts and details like text, observations, figures, symbols, and descriptions of things etc
- Data does not carry any specific purpose and has no significance by itself
- Data is measured in terms of bits and bytes
- Types
  - Quantitative
    - Numerical form => Weight, volume, cost of an item
  - Qualitative
    - Descriptive, but not numerical => Name, gender, hair color of a person
Information
- Processed data is called Information
- It provides context of the data and enables decision making
Database
- Database is an electronic place/system where data is stored in a way that it can be easily accessed, managed, and updated
DBMS
- A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data
- The collection of data, usually referred to as the database, contains information relevant to an enterprise
- The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient.
- A DBMS is the database itself, along with all the software and functionality. It is used to perform different operations, like addition, access, updating, and deletion of the data
Disadvantages of File System
- Slow Searching, Not Efficient Memory Utilization
- Difficulty in accessing data
- Concurrency => Data Inconsistency
- Data Redundancy
- Data isolation
- Integrity problems
- Atomicity problems
- Security
CAP Theorem
- Concept in Distributed Databases
- The CAP theorem states that a distributed system can only provide two of three properties simultaneously: consistency, availability, and partition tolerance
- CAP
  - Consistency
    - In a consistent system, all nodes see the same data simultaneously
    - The read should cause all nodes to return the same data
  - Availability
    - It means that the system remains operational all of the time
    - Every request will get a response regardless of the individual state of the nodes
    - Unlike a consistent system, there’s no guarantee that the response will be the most recent write operation
  - Partition Tolerance
    - When a distributed system encounters a partition, it means that there’s a break in communication between nodes
    - If a system is partition-tolerant, the system does not fail, regardless of whether messages are dropped or delayed between nodes within the system
    - To have partition tolerance, the system must replicate records across combinations of nodes and networks
- NoSQL Databases => Great for distributed networks, allow for horizontal scaling, and can quickly scale across multiple nodes
  - CA Databases
    - CA databases enable consistency and availability across all nodes
    - Unfortunately, CA databases can’t deliver fault tolerance
    - In any distributed system, partitions are bound to happen, which means this type of database isn’t a very practical choice
    - Some relational databases, such as MySQL or PostgreSQL, allow for consistency and availability
  - CP Databases
    - CP databases enable consistency and partition tolerance, but not availability
    - When a partition occurs, the system has to turn off inconsistent nodes until the partition can be fixed
    - MongoDB is an example of a CP database
    - The CP system is structured so that there’s only one primary node that receives all of the write requests in a given replica set
    - Secondary nodes replicate the data in the primary nodes, so if the primary node fails, a secondary node can stand-in
  - AP Databases
    - AP databases enable availability and partition tolerance, but not consistency
    - In the event of a partition, all nodes are available, but they’re not all updated
    - When the partition is eventually resolved, most AP databases will sync the nodes to ensure consistency across them
    - Apache Cassandra is an example of an AP database
      - It’s a NoSQL database with no primary node, meaning that all of the nodes remain available
      - Cassandra allows for eventual consistency because users can re-sync their data right after a partition is resolved
BASE property
- Basically Available
  - System remains operational and provides basic functionality even in the presence of failures or partitioning
- Soft state
  - The state of the system may change over time, even without any input or activity
- Eventually consistent
  - The system guarantees that the data will eventually become consistent, but there may be a temporary period of inconsistency
Master-Slave Architecture
- Master-Slave is a general way to optimize IO in a system where number of requests goes way high that a single DB server is not able to handle it efficiently
- The true or latest data is kept in the Master DB thus write operations are directed there, Reading ops are done only from slaves
- This architecture serves the purpose of safeguarding site reliability, availability, reduce latency etc
- If a site receives a lot of traffic and the only available database is one master, it will be overloaded with reading and writing requests
  - Making the entire system slow for everyone on the site
- DB replication will take care of distributing data from Master machine to Slaves machines
  - This can be synchronous or asynchronous depending upon the system’s need