Item-based Collaborative Filtering

Item-based Collaborative Filtering

一 Item-based的工作流程

Item similarity is a route to computing the prediction of a user’s item preference in this post.

1.1 Compute similarity between pairs of items

1.1.1 Picking Neighbors

  • Usually, pick the k most similar items that the user has rated.
  • Set k=20 often works well.(k too small, inaccurate scores; too large, too much noise introduced by low-similarity items)

    1.1.2 Item Similarities

  • Usually use cosine similarity between item rating vectors(normalize user ratings first). Pre-compute similarities for all pairs of items.

  • Since user hasn't rated everything, need M >> k neighbors per item in model,other than O(|I| * |I|).

  • sometimes similarity is damped, but often doesn't help much
sim(i,j)=uU(i)\capU(j)r^uir^ujur^2uiur^2uj

1.2 Predict user-item rating

Compute weighted average of user's ratings for the k most similar items.

pui=jNsim(i,j)rujjN|sim(i,j)|
  • Weighted sum of rated “item-neighbors"
  • Linear regression to estimate rating

二 Item-based v.s. User-based

User-based的劣势:

From Computational performance's prospective:

  • With millions of users (or more), computing allpairs correlations is expensive.
    通常推荐系统的user数量多于item数量.使用User-based方法时,计算用户的近邻用户的计算量很大.即使采用增量更新,也要牺牲时间代价.
  • User profiles could change quickly – needed to compute in real time to keep users statisfied 与content-based不同,collaborative filtering需要通过用户的行为建立user的profiles,一旦用户行为的更新到一定量之后,需要重新构建profiles.

Item-based's Insight:

  • Item-Item similarity is fairly stable
    This is dependent on having many more users than items
    —— Average item has many more ratings than an average user
    —— items don’t generally change rapidly – at least not in ratings space (special case for time-bound
    items)

Item-based的优势:

  • It actually works quite well
    Good MAE performance on prediction; good rank performance on top-N
  • Efficient implementation
    当user个数远大于item个数时: |U| >> |I|. 注意到,对user推荐item时, 搜索的近邻items是在user打过分的items中寻找的.因此该计算量并不会很大.
    此外,item-item的相似度矩阵可以实现计算好.

Item-based's limitation:

lower serendipity, not fully studied; intuition is clear


四 课程材料

vedios
slides
code

下一节的Latent factor model的课程材料一并在这里贴出,不再单独写notes.

vedios
slides
code


推荐阅读

  1. Item-to-Item Collaborative Filtering with Amazon's Recommendation System
  2. Ekstrand M D, Kannan P, Stemper J A, et al. Automatically building research reading lists[C]//Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010: 159-166.
Item-based%20Collaborative%20Filtering%0A%3D%3D%3D%0A@%5Brecsys%7Cpublished%5D%20%20%0A%0A%23%23%u4E00%20Item-based%u7684%u5DE5%u4F5C%u6D41%u7A0B%20%20%0AItem%20similarity%20is%20a%20route%20to%20computing%20the%20prediction%20of%20a%20user%u2019s%20item%20preference%20in%20this%20post.%20%20%0A%23%23%231.1%20Compute%20similarity%20between%20pairs%20of%20items%20%20%0A%23%23%23%231.1.1%20Picking%20Neighbors%20%20%0A%20%20%20%20%0A-%20Usually%2C%20pick%20the%20*k*%20most%20similar%20items%20that%20the%20user%20has%20rated.%20%20%0A-%20Set%20*k*%3D20%20often%20works%20well.%28*k*%20too%20small%2C%20inaccurate%20scores%3B%20too%20large%2C%20too%20much%20noise%20introduced%20by%20low-similarity%20items%29%20%20%0A%23%23%23%231.1.2%20Item%20Similarities%20%20%0A-%20Usually%20use%20cosine%20similarity%20between%20item%20rating%20vectors%28normalize%20user%20ratings%20first%29.%20%20Pre-compute%20similarities%20for%20all%20pairs%20of%20items.%20%20%0A%20%20%20%20%0A-%20Since%20user%20hasn%27t%20rated%20everything%2C%20need%20M%20%3E%3E%20k%20neighbors%20per%20item%20in%20model%2Cother%20than%20O%28%7CI%7C%20*%20%7CI%7C%29.%20%20%0A-%20sometimes%20similarity%20is%20damped%2C%20but%20often%20doesn%27t%20help%20much%0A%0A%60%60%60mathjax%0A%20%20%20%20sim%28i%2Cj%29%3D%5Cfrac%7B%5Csum_%7Bu%20%5Cin%20U%28i%29%5CcapU%28j%29%7D%5Chat%7Br%7D_%7Bui%7D%5Chat%7Br%7D_%7Buj%7D%7D%7B%5Csqrt%7B%5Csum_%7Bu%7D%5Chat%7Br%7D%5E2_%7Bui%7D%7D%5Csqrt%7B%5Csum_%7Bu%7D%5Chat%7Br%7D%5E2_%7Buj%7D%7D%7D%0A%60%60%60%20%20%0A%0A%23%23%231.2%20Predict%20user-item%20rating%20%20%0ACompute%20weighted%20average%20of%20user%27s%20ratings%20for%20the%20*k*%20most%20similar%20items.%20%20%0A%60%60%60mathjax%0A%20%20%20%20p_%7Bui%7D%3D%5Cfrac%7B%5Csum_%7Bj%5Cin%20N%7Dsim%28i%2Cj%29r_%7Buj%7D%7D%7B%5Csum_%7Bj%20%5Cin%20N%7D%7Csim%28i%2Cj%29%7C%7D%0A%60%60%60%20%20%0A%0A-%20Weighted%20sum%20of%20rated%20%u201Citem-neighbors%u201D%20%20%0A-%20Linear%20regression%20to%20estimate%20rating%0A%0A%23%23%u4E8C%20Item-based%20v.s.%20User-based%20%20%0A%23%23%23User-based%u7684%u52A3%u52BF%3A%0AFrom%20Computational%20performance%27s%20prospective%3A%0A*%20With%20millions%20of%20users%20%28or%20more%29%2C%20**computing%20allpairs%20correlations%20is%20expensive.**%20%20%0A%20%20%20%20%u901A%u5E38%u63A8%u8350%u7CFB%u7EDF%u7684user%u6570%u91CF%u591A%u4E8Eitem%u6570%u91CF.%u4F7F%u7528User-based%u65B9%u6CD5%u65F6%2C%u8BA1%u7B97%u7528%u6237%u7684%u8FD1%u90BB%u7528%u6237%u7684%u8BA1%u7B97%u91CF%u5F88%u5927.%u5373%u4F7F%u91C7%u7528%u589E%u91CF%u66F4%u65B0%2C%u4E5F%u8981%u727A%u7272%u65F6%u95F4%u4EE3%u4EF7.%20%0A*%20**User%20profiles%20could%20change%20quickly**%20%u2013%20needed%20to%20compute%20in%20real%20time%20to%20keep%20users%20statisfied%20%20%20%20%20%u4E0Econtent-based%u4E0D%u540C%2Ccollaborative%20filtering%u9700%u8981%u901A%u8FC7%u7528%u6237%u7684%u884C%u4E3A%u5EFA%u7ACBuser%u7684profiles%2C%u4E00%u65E6%u7528%u6237%u884C%u4E3A%u7684%u66F4%u65B0%u5230%u4E00%u5B9A%u91CF%u4E4B%u540E%2C%u9700%u8981%u91CD%u65B0%u6784%u5EFAprofiles.%20%20%0A%0A%23%23%23Item-based%27s%20Insight%3A%0A*%20**Item-Item%20similarity%20is%20fairly%20stable**%20%20%0A%20%20%20%20This%20is%20dependent%20on%20having%20many%20more%20users%20than%20items%20%20%0A%20%20%20%20%u2014%u2014%20Average%20item%20has%20many%20more%20ratings%20than%20an%20average%20user%20%20%20%0A%20%20%20%20%u2014%u2014%20items%20don%u2019t%20generally%20change%20rapidly%20%u2013%20at%20least%20not%20in%20ratings%20space%20%28special%20case%20for%20time-bound%0Aitems%29%20%20%20%20%0A%0A%23%23%23Item-based%u7684%u4F18%u52BF%3A%20%20%0A*%20It%20actually%20works%20quite%20well%20%20%0AGood%20MAE%20performance%20on%20prediction%3B%20good%20rank%20performance%20on%20top-*N*%20%20%0A*%20Efficient%20implementation%20%20%0A%20%20%20%20%u5F53user%u4E2A%u6570%u8FDC%u5927%u4E8Eitem%u4E2A%u6570%u65F6%3A%20%7CU%7C%20%3E%3E%20%7CI%7C.%20%u6CE8%u610F%u5230%2C%u5BF9user%u63A8%u8350item%u65F6%2C%20%u641C%u7D22%u7684%u8FD1%u90BBitems%u662F%u5728user%u6253%u8FC7%u5206%u7684items%u4E2D%u5BFB%u627E%u7684.%u56E0%u6B64%u8BE5%u8BA1%u7B97%u91CF%u5E76%u4E0D%u4F1A%u5F88%u5927.%20%20%20%20%0A%20%20%20%20%u6B64%u5916%2Citem-item%u7684%u76F8%u4F3C%u5EA6%u77E9%u9635%u53EF%u4EE5%u5B9E%u73B0%u8BA1%u7B97%u597D.%20%20%0A%0A%0A%0A%0A%23%23%23Item-based%27s%20limitation%3A%20%20%0A**lower%20serendipity**%2C%20not%20fully%20studied%3B%20intuition%20is%20clear%0A%0A---%0A%0A%23%23%u56DB%20%u8BFE%u7A0B%u6750%u6599%20%20%0A%5Bvedios%5D%28http%3A//pan.baidu.com/s/1gdGCNpL%29%20%20%0A%5Bslides%5D%28http%3A//pan.baidu.com/s/1i3C4WtB%29%20%20%0A%5Bcode%5D%28http%3A//pan.baidu.com/s/1kTLUsD5%29%20%20%20%0A%0A%3E%u4E0B%u4E00%u8282%u7684Latent%20factor%20model%u7684%u8BFE%u7A0B%u6750%u6599%u4E00%u5E76%u5728%u8FD9%u91CC%u8D34%u51FA%uFF0C%u4E0D%u518D%u5355%u72EC%u5199notes.%20%20%0A%0A%5Bvedios%5D%28http%3A//pan.baidu.com/s/1mgHO5wW%29%20%20%0A%5Bslides%5D%28http%3A//pan.baidu.com/s/1AbzP0%29%20%20%0A%5Bcode%5D%28http%3A//pan.baidu.com/s/1mgyRDyg%29%0A%0A----%0A%23%23%u63A8%u8350%u9605%u8BFB%20%20%0A1.%20%5BItem-to-Item%20Collaborative%20Filtering%20with%20Amazon%27s%20Recommendation%20System%5D%28http%3A//blog.echen.me/2011/02/15/an-overview-of-item-to-item-collaborative-filtering-with-amazons-recommendation-system/%29%0A2.%20%5BEkstrand%20M%20D%2C%20Kannan%20P%2C%20Stemper%20J%20A%2C%20et%20al.%20Automatically%20building%20research%20reading%20lists%5BC%5D//Proceedings%20of%20the%20fourth%20ACM%20conference%20on%20Recommender%20systems.%20ACM%2C%202010%3A%20159-166.%5D%28http%3A//files.grouplens.org/papers/reading-lists.pdf%29


comments powered by Disqus