Trying to Understand MongoDB Data Modelling

I am currently taking a node.js, express, and mongoDB bootcamp on Udemy by Jonas Schmedtmann https://www.udemy.com/course/nodejs-express-mongodb-bootcamp/ and I am struggling a bit to understand data modeling so today I rewatched the two class videos that cover this in detail, and took the following notes:

Data Modeling

Real world scenario -> Unstructured Data -> Structured, logical data Model

Steps to Model Data
1. Identify Relationships between Data
  1. 1 to 1
    1. Movie -> Name
  2. 1 to Many
    1. 1: Few
      1. Movie -> Award, Movie -> Award
    2. 1: Many
      1. Movie -> review (thousands)
    3. 1: TON
      1. App -> Log (millions)
  3. Many to Many
    1. Movie -> Actor
      1. A movie can have many actors and an actor can play in many movies
2. Referencing/normalization vs. embedding/denormalization
  1. Referenced / Normalized
    1. Each Data Document is separate
    2. One document references the others by ID
    3. Performance increased when need to query each document on its own
    4. Con: Need 2 queries to get data from referenced document
  2. Embedded / Normalized
    1. Data Documents are combined into a single document
    2. Good on Performance
      1. Can get all data in one query
    3. Impossible to query the embedded document on its own
3. Embedding or referencing other documents
  1. Embedding
    1. Relationship Type
      1. 1: Few
      2. 1: Many
    2. Data Access Patterns
      1. Data is mostly read
      2. Data does not change quickly
      3. (High read/write ratio
    3. Data Closeness
      1. Data really belongs together
  2. Referencing
    1. Relationship Type
      1. 1: Many
      2. 1: Ton
      3. Many:Many
    2. Data Access Patterns
      1. Data is updated a lot
      2. (low read/write ratio)
    3. Data Closeness
      1. We frequently need to query both datasets on their own
4. Types of Referencing
  1. Child Referencing
    1. The parent contains references to the ids of its children
    2. Uses
      1. 1: FEW
  2. Parent Referencing
    1. The child contains a reference to the id of its parent
    2. Uses
      1. 1:Many
      2. 1:Ton
  3. Two-Way Referencing
    1. The parent references its children and the child references it’s parents
      1. Example movies and actors
        Movie references the ids of all of the actors in the movie
        Actor references the ids of all of the movies that they have acted in
    2. Uses
      1. Many:Many
5. Important Principles to Consider when deciding
  1. Most Important: Structure your data to match the ways that your application queries and updates data
    1. Identify the questions that arise from your application’s use cases first, and then model your data so that the questions can get answered in the most efficient way
    2. Always favor embedding, unless there is a good reason not to embed.
      1. Especially for 1:Few and 1:Many
  2. 1:Ton or Many:Many is usually a good reason to reference instead of embedding
  3. Favor referencing when data is updated a lot and if you need to frequently access a dataset on its own
  4. Use embedding when data is mostly read but rarely updated, and when two datasets belong intrinsically together
  5. Don’t allow arrays to grow indefinitely. Therefore, if you need to normalize, use child referencing for 1:Many relationships, and parent referencing for 1:Ton relationships
  6. Use Two-Way referencing for Many:Many relationships

Then I tried to model some data for a Poker Home Game Stats Tracking App that I plan to make as a portfolio project.

While modeling the data I realized I still wasn’t very clear on what data would belong in a new document. For example in the poker app there would be a document for users and a document for games. But If I wanted to display information on a leader board with stats in it would I need a leader board document? I believe so because I would not want to have to reference all of the users and games to build the stats each time a user accessed the page, I would likely have the stats compiled from the data and stored in a chart of some type.

Related