I am currently taking a node.js, express, and mongoDB bootcamp on Udemy by Jonas Schmedtmann https://www.udemy.com/course/nodejs-express-mongodb-bootcamp/ and I am struggling a bit to understand data modeling so today I rewatched the two class videos that cover this in detail, and took the following notes:
Data Modeling
Real world scenario -> Unstructured Data -> Structured, logical data Model
- Steps to Model Data
- Identify Relationships between Data
- 1 to 1
- Movie -> Name
- 1 to Many
- 1: Few
- Movie -> Award, Movie -> Award
- 1: Many
- Movie -> review (thousands)
- 1: TON
- App -> Log (millions)
- 1: Few
- Many to Many
- Movie -> Actor
- A movie can have many actors and an actor can play in many movies
- Movie -> Actor
- 1 to 1
- Referencing/normalization vs. embedding/denormalization
- Referenced / Normalized
- Each Data Document is separate
- One document references the others by ID
- Performance increased when need to query each document on its own
- Con: Need 2 queries to get data from referenced document
- Embedded / Normalized
- Data Documents are combined into a single document
- Good on Performance
- Can get all data in one query
- Impossible to query the embedded document on its own
- Referenced / Normalized
- Embedding or referencing other documents
- Embedding
- Relationship Type
- 1: Few
- 1: Many
- Data Access Patterns
- Data is mostly read
- Data does not change quickly
- (High read/write ratio
- Data Closeness
- Data really belongs together
- Relationship Type
- Referencing
- Relationship Type
- 1: Many
- 1: Ton
- Many:Many
- Data Access Patterns
- Data is updated a lot
- (low read/write ratio)
- Data Closeness
- We frequently need to query both datasets on their own
- Relationship Type
- Embedding
- Types of Referencing
- Child Referencing
- The parent contains references to the ids of its children
- Uses
- 1: FEW
- Parent Referencing
- The child contains a reference to the id of its parent
- Uses
- 1:Many
- 1:Ton
- Two-Way Referencing
- The parent references its children and the child references it’s parents
- Example movies and actors
- Movie references the ids of all of the actors in the movie
- Actor references the ids of all of the movies that they have acted in
- Example movies and actors
- Uses
- Many:Many
- The parent references its children and the child references it’s parents
- Child Referencing
- Important Principles to Consider when deciding
- Most Important: Structure your data to match the ways that your application queries and updates data
- Identify the questions that arise from your application’s use cases first, and then model your data so that the questions can get answered in the most efficient way
- Always favor embedding, unless there is a good reason not to embed.
- Especially for 1:Few and 1:Many
- 1:Ton or Many:Many is usually a good reason to reference instead of embedding
- Favor referencing when data is updated a lot and if you need to frequently access a dataset on its own
- Use embedding when data is mostly read but rarely updated, and when two datasets belong intrinsically together
- Don’t allow arrays to grow indefinitely. Therefore, if you need to normalize, use child referencing for 1:Many relationships, and parent referencing for 1:Ton relationships
- Use Two-Way referencing for Many:Many relationships
- Most Important: Structure your data to match the ways that your application queries and updates data
- Identify Relationships between Data
Then I tried to model some data for a Poker Home Game Stats Tracking App that I plan to make as a portfolio project.
While modeling the data I realized I still wasn’t very clear on what data would belong in a new document. For example in the poker app there would be a document for users and a document for games. But If I wanted to display information on a leader board with stats in it would I need a leader board document? I believe so because I would not want to have to reference all of the users and games to build the stats each time a user accessed the page, I would likely have the stats compiled from the data and stored in a chart of some type.