Introduction - D B M S

  • Data
    • Data is a collection of raw, unorganized facts and details like text, observations, figures, symbols, and descriptions of things etc
    • Data does not carry any specific purpose and has no significance by itself
    • Data is measured in terms of bits and bytes
    • Types
      • Quantitative
        • Numerical form => Weight, volume, cost of an item
      • Qualitative
        • Descriptive, but not numerical => Name, gender, hair color of a person
  • Information
    • Processed data is called Information
    • It provides context of the data and enables decision making
  • Database
    • Database is an electronic place/system where data is stored in a way that it can be easily accessed, managed, and updated
  • DBMS
    • A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data
    • The collection of data, usually referred to as the database, contains information relevant to an enterprise
    • The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient.
    • A DBMS is the database itself, along with all the software and functionality. It is used to perform different operations, like addition, access, updating, and deletion of the data
    • DBMS
  • Disadvantages of File System
    • Slow Searching, Not Efficient Memory Utilization
    • Difficulty in accessing data
    • Concurrency => Data Inconsistency
    • Data Redundancy
    • Data isolation
    • Integrity problems
    • Atomicity problems
    • Security
  • CAP Theorem
    • Concept in Distributed Databases
    • The CAP theorem states that a distributed system can only provide two of three properties simultaneously: consistency, availability, and partition tolerance
    • CAP
      • Consistency
        • In a consistent system, all nodes see the same data simultaneously
        • The read should cause all nodes to return the same data
      • Availability
        • It means that the system remains operational all of the time
        • Every request will get a response regardless of the individual state of the nodes
        • Unlike a consistent system, there’s no guarantee that the response will be the most recent write operation
      • Partition Tolerance
        • When a distributed system encounters a partition, it means that there’s a break in communication between nodes
        • If a system is partition-tolerant, the system does not fail, regardless of whether messages are dropped or delayed between nodes within the system
        • To have partition tolerance, the system must replicate records across combinations of nodes and networks
    • NoSQL Databases => Great for distributed networks, allow for horizontal scaling, and can quickly scale across multiple nodes
      • CA Databases
        • CA databases enable consistency and availability across all nodes
        • Unfortunately, CA databases can’t deliver fault tolerance
        • In any distributed system, partitions are bound to happen, which means this type of database isn’t a very practical choice
        • Some relational databases, such as MySQL or PostgreSQL, allow for consistency and availability
      • CP Databases
        • CP databases enable consistency and partition tolerance, but not availability
        • When a partition occurs, the system has to turn off inconsistent nodes until the partition can be fixed
        • MongoDB is an example of a CP database
        • The CP system is structured so that there’s only one primary node that receives all of the write requests in a given replica set
        • Secondary nodes replicate the data in the primary nodes, so if the primary node fails, a secondary node can stand-in
      • AP Databases
        • AP databases enable availability and partition tolerance, but not consistency
        • In the event of a partition, all nodes are available, but they’re not all updated
        • When the partition is eventually resolved, most AP databases will sync the nodes to ensure consistency across them
        • Apache Cassandra is an example of an AP database
          • It’s a NoSQL database with no primary node, meaning that all of the nodes remain available
          • Cassandra allows for eventual consistency because users can re-sync their data right after a partition is resolved
  • BASE property
    • Basically Available
      • System remains operational and provides basic functionality even in the presence of failures or partitioning
    • Soft state
      • The state of the system may change over time, even without any input or activity
    • Eventually consistent
      • The system guarantees that the data will eventually become consistent, but there may be a temporary period of inconsistency
  • Master-Slave Architecture
    • Master-Slave is a general way to optimize IO in a system where number of requests goes way high that a single DB server is not able to handle it efficiently
    • Error
    • The true or latest data is kept in the Master DB thus write operations are directed there, Reading ops are done only from slaves
    • This architecture serves the purpose of safeguarding site reliability, availability, reduce latency etc
    • If a site receives a lot of traffic and the only available database is one master, it will be overloaded with reading and writing requests
      • Making the entire system slow for everyone on the site
    • DB replication will take care of distributing data from Master machine to Slaves machines
      • This can be synchronous or asynchronous depending upon the system’s need
Share: