Design Tiny URL

  • Scenario
    • Insert: short_url Insert(long url)
    • Lookup: long_url lookup(short url)
  • Necessity
    • 1M daily users
    • Insert
      • per day: 1M * 1%(function usage) * 10(function freq) = 100,000
      • per year:365.M(disk requirement)
      • per second: 100, 000/86400 = 1.2
    • Lookup
      • per day: 1M * 100% * 3 = 3M
      • per second: 3M/86400 = 35
  • Algorithm
  • class Shortener{
        unordered_map<string, string> mLongToShort;
        unordered_map<string, string> mShortToLong;
    
        string insert(string longURL){
            if(mLongToShort.count(longURL) < 1){
                string shortURL = GenerateShortURL();
                mLongToShort[longURL] = shortURL;
                mShortToLong[shortURL] = longURL;
            }
            return shortURL;
        }
    }
    
  • cont'd algorithm

    • generate short url
    • // the simplest way
      string GenerateShortURL(){
          return string(mLongToShort.size());
      }
      
      // reduce the size of shortURL
      // more characters![a-z][A-Z]
      string GenerateShortURL(){
          return convertTo62(mLongToShort.size());
      }
      
      string convertTo62(int input){
          char encode[62] = {'0',...,'9', 'a', ..., 'z', 'A', ... 'Z'};
          string ret = "";
          while(input){
              ret = encode[input % 62] + ret;
              input /= 62;
          }
          return ret;
      }
      

  • Data
    • average size of longURL = 100bytes
    • average size of shortURL = 4 bytes(int)
    • state = 4byte
    • daily new URL = 100,000 * 108 = 10.8MB
    • yearly new URL = 10.8 * 365 = 4GB
  • Followup
    • birthday attack?
      • support random
    • how to avoid conflicts
    • how to implement time-limited service?
      • state: valid time/expire time
    • how to cache
      • pre-load(only 4GB!)
      • limited RAM? using LRU!
      • partition

results matching ""

    No results matching ""