Design Tiny URL
- Scenario
- Insert: short_url Insert(long url)
- Lookup: long_url lookup(short url)
- Necessity
- 1M daily users
- Insert
- per day: 1M * 1%(function usage) * 10(function freq) = 100,000
- per year:365.M(disk requirement)
- per second: 100, 000/86400 = 1.2
- Lookup
- per day: 1M * 100% * 3 = 3M
- per second: 3M/86400 = 35
- Algorithm
class Shortener{ unordered_map<string, string> mLongToShort; unordered_map<string, string> mShortToLong; string insert(string longURL){ if(mLongToShort.count(longURL) < 1){ string shortURL = GenerateShortURL(); mLongToShort[longURL] = shortURL; mShortToLong[shortURL] = longURL; } return shortURL; } }cont'd algorithm
- generate short url
// the simplest way string GenerateShortURL(){ return string(mLongToShort.size()); } // reduce the size of shortURL // more characters![a-z][A-Z] string GenerateShortURL(){ return convertTo62(mLongToShort.size()); } string convertTo62(int input){ char encode[62] = {'0',...,'9', 'a', ..., 'z', 'A', ... 'Z'}; string ret = ""; while(input){ ret = encode[input % 62] + ret; input /= 62; } return ret; }

- Data
- average size of longURL = 100bytes
- average size of shortURL = 4 bytes(int)
- state = 4byte
- daily new URL = 100,000 * 108 = 10.8MB
- yearly new URL = 10.8 * 365 = 4GB
- Followup
- birthday attack?
- support random
- how to avoid conflicts
- how to implement time-limited service?
- state: valid time/expire time
- how to cache
- pre-load(only 4GB!)
- limited RAM? using LRU!
- partition
- birthday attack?