- Definition
- Defining the architecture, components, modules, interfaces and data for a system to satisfy specified requirements
- Conceptual design -> Logical design -> Physical design (Macro -> Micro, can also Micro -> Macro)
- What is a good design?
- Healthiness
- Simplicity
- No more, no less
- Understandable
- Focus in this lesson: Fundamental questions in system design interviews
- Design the system
- Evaluate query per second
- Scale the system
- Design Netflix/ytb/Spotify
- Tag:Uber, Google, Alibaba
Please design "Netflix" - Macro
- Crack a design in 5 steps:
- Scenario: case/interface
- Necessity: constrain/hypothesis
- Application:service/algorithm
- Kilobit:data
- Evolve
- Scenario:case/interface
- Enumerate (chat w/ interviewer)
- register/login
- play movie
- movie recommendation
- ...
- Sort
- play movie
- get channels
- get movies in channels
- play a movie in a channel
- ...
- Necessity: constrain/hypothesis(主要经验估算,辅以询问数据)
- Ask: how many active users?
- Predict
- User
- average concurrent users = $$\frac{daily-active-users}{daily-seconds} * average-online-time$$
- peak users = average concurrent users * 6 (6是经验值)
- MAX peak users in 3 months = peak users * 2 (上线后三个月可能用户增长)
- Traffic
- Traffic per user = 3Mbps
- Max peak traffic = Max peak users * traffic per user
- Memory
- Memory per user = 10KB
- Max daily memory = daily active users * 2 (3 months) * 10KB = 100GB(Redis TB OK)
- Storage
- Total # of movies = 14,000
- Movie storage = # of movies * average movie size(multiple versions) = 14,000*50GB = 700TB
- Application:service algorithm
- Replay the case, add a service for each request
- Merge the service

- Kilobit: data
- Append dataset for each request below a service
- Choose storage types
- User Service - Accounts(MySQL)
- Channel Service - Channel List(MongeDB?? -> 档案)
- Movie Service - Movies(Files)
- Evolve
- Analyze
- Better: constraints
- Broader: new cases
- Deeper:details
- Views..
- Performance
- Scalability (# of users.. # of machines)
- Robustness
Please design "Netflix" - Micro
- Design recommendation module

- 10^6~10^9 approximately 1s
- Similarity <- bucket sort (倒序索引 Inverted Index,以movie为key建立索引)
- Dispatcher -> load balancer

- 密集型:
- 硬盘密集 - 读写?,爬虫结果保存
- CPU密集 - 计算密集,爬虫内部去重
- 内存密集 - 爬虫结果二次计算
- 网络带宽密集 - 爬虫爬取网页
Improve Robustness

- Dispatcher 可能有mirror
- Feed manager -> timeline manager 信息流
- Loggers -> Monitor