User-based Collaborative Filtering

User-based Collaborative Filtering

User-based CF 包含下列步骤:

1. Selecting Neighborhoods

1.1 Select all neighbors  
1.2 Threadhold similarity or distance  
1.3 Random neighbors  
1.4 Top-*N* neighbors by similarity or distance  
理论上,当具备良好的相似度度量方式时,选择越多user作为近邻,user的兴趣面就越广.  
而实际上,选择越多user,越容易引进噪声.
一般选择 25 - 100user作为近邻  
本节实验部分采用 Top-30  

2. Scoring Items from Neighborhoods

如何根据近邻user对item的评分预测给定user对item的的评分  
可以考虑如下方法:  
- Average
- Weighted Average
- Multiple linear regression

其中,Weighted Average方法常用且有效,本节实验部分采用该方法.

3. Normalizing Data

user的习惯各不相同,有的user偏向于打高分,有的偏向于打低分. 因此有必要对user的打分做归一化.  
可以采用Mean-centering方法,即Substract user mean prior to computing  
也可以采用z-score normization方法:  
a. Mean-center, and divide by standard deviation
b. Normalizes for the spread across the scale
c. small additional gain in prediction accuracy over mean-centering

本节实验采用Mean-center方法,结合Weighted Average,定义如下:

Pu,i=μu+vN(u;i)S(u,v)(rv,iμv)vN(u;i)s(u,v)

That is, compute the weighted average of each neighbor v offset from average (rv,jμv), then add the user's average rating muu. N(u;i)
is the neighbors of μ for item i

4. Computing Similarities

如何选取user的近邻users?  
4.1 使用user的rating作为特征,按cosine求距离,选择Top-*30*  
4.2 使用Pearson correction coefficient 定义如下:
4.3 使用聚类方法寻找近邻users
    然后pick user's cluster to generate predictions.

本节实验部分采用4.1的方法.  

本节课程材料

vedio
slides
code


推荐阅读:

  1. Herlocker J L, Konstan J A, Borchers A, et al. An algorithmic framework for performing collaborative filtering[C]//Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999: 230-237.
  2. Herlocker J L, Konstan J A, Riedl J. Explaining collaborative filtering recommendations[C]//Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM, 2000: 241-250.
User-based%20Collaborative%20Filtering%0A%3D%3D%3D%3D%0A@%5Brecsys%7Cpublished%5D%0A%23%23User-based%20CF%20%u5305%u542B%u4E0B%u5217%u6B65%u9AA4%3A%0A%23%23%231.%20%20Selecting%20Neighborhoods%20%20%0A%20%20%20%201.1%20Select%20all%20neighbors%20%20%0A%20%20%20%201.2%20Threadhold%20similarity%20or%20distance%20%20%0A%20%20%20%201.3%20Random%20neighbors%20%20%0A%20%20%20%201.4%20Top-*N*%20neighbors%20by%20similarity%20or%20distance%20%20%0A%20%20%20%20%u7406%u8BBA%u4E0A%2C%u5F53%u5177%u5907%u826F%u597D%u7684%u76F8%u4F3C%u5EA6%u5EA6%u91CF%u65B9%u5F0F%u65F6%2C%u9009%u62E9%u8D8A%u591Auser%u4F5C%u4E3A%u8FD1%u90BB%2Cuser%u7684%u5174%u8DA3%u9762%u5C31%u8D8A%u5E7F.%20%20%0A%20%20%20%20%u800C%u5B9E%u9645%u4E0A%2C%u9009%u62E9%u8D8A%u591Auser%2C%u8D8A%u5BB9%u6613%u5F15%u8FDB%u566A%u58F0.%0A%20%20%20%20%u4E00%u822C%u9009%u62E9%2025%20-%20100%20%u4E2Auser%u4F5C%u4E3A%u8FD1%u90BB%20%20%0A%20%20%20%20%u672C%u8282%u5B9E%u9A8C%u90E8%u5206%u91C7%u7528%20Top-30%20%20%0A%0A%23%23%232.%20%20Scoring%20Items%20from%20Neighborhoods%20%20%0A%20%20%20%20%u5982%u4F55%u6839%u636E%u8FD1%u90BBuser%u5BF9item%u7684%u8BC4%u5206%u9884%u6D4B%u7ED9%u5B9Auser%u5BF9item%u7684%u7684%u8BC4%u5206%20%20%0A%20%20%20%20%u53EF%u4EE5%u8003%u8651%u5982%u4E0B%u65B9%u6CD5%3A%20%20%0A%20%20%20%20-%20Average%0A%20%20%20%20-%20Weighted%20Average%0A%20%20%20%20-%20Multiple%20linear%20regression%0A%20%20%20%20%0A%20%20%20%20%u5176%u4E2D%2CWeighted%20Average%u65B9%u6CD5%u5E38%u7528%u4E14%u6709%u6548%2C%u672C%u8282%u5B9E%u9A8C%u90E8%u5206%u91C7%u7528%u8BE5%u65B9%u6CD5.%0A%20%20%20%20%0A%23%23%233.%20%20Normalizing%20Data%20%20%0A%20%20%20%20user%u7684%u4E60%u60EF%u5404%u4E0D%u76F8%u540C%2C%u6709%u7684user%u504F%u5411%u4E8E%u6253%u9AD8%u5206%2C%u6709%u7684%u504F%u5411%u4E8E%u6253%u4F4E%u5206.%20%u56E0%u6B64%u6709%u5FC5%u8981%u5BF9user%u7684%u6253%u5206%u505A%u5F52%u4E00%u5316.%20%20%0A%20%20%20%20%u53EF%u4EE5%u91C7%u7528Mean-centering%u65B9%u6CD5%2C%u5373Substract%20user%20mean%20prior%20to%20computing%20%20%0A%20%20%20%20%u4E5F%u53EF%u4EE5%u91C7%u7528z-score%20normization%u65B9%u6CD5%3A%20%20%0A%20%20%20%20a.%20Mean-center%2C%20and%20divide%20by%20standard%20deviation%0A%20%20%20%20b.%20Normalizes%20for%20the%20spread%20across%20the%20scale%0A%20%20%20%20c.%20small%20additional%20gain%20in%20prediction%20accuracy%20over%20mean-centering%0A%0A%u672C%u8282%u5B9E%u9A8C%u91C7%u7528Mean-center%u65B9%u6CD5%2C%u7ED3%u5408Weighted%20Average%2C%u5B9A%u4E49%u5982%u4E0B%3A%0A%60%60%60mathjax%0A%20%20%20%20%5Cmathcal%7BP%7D_%7Bu%2Ci%7D%20%3D%20%5Cmu_u%20+%20%5Cfrac%7B%5Csum_%7Bv%20%5Cin%20N%28u%3Bi%29%7DS%28u%2Cv%29%28r_%7Bv%2Ci%7D-%5Cmu_%7Bv%7D%29%7D%7B%5Csum_%7Bv%5Cin%20N%28u%3Bi%29%5Cmid%20s%28u%2Cv%29%5Cmid%7D%7D%0A%60%60%60%0AThat%20is%2C%20compute%20the%20weighted%20average%20of%20each%20neighbor%20%60%24%5Cmathcal%7Bv%7D%5E%7B%27%7D%24%60%20offset%20from%20average%20%60%24%28r_%7Bv%2Cj%7D-%5Cmu_v%29%24%60%2C%20then%20add%20the%20user%27s%20average%20rating%20%60%24mu_%7Bu%7D%24%60.%20%60%24N%28u%3Bi%29%24%60%0A%20is%20the%20neighbors%20of%20%60%24%5Cmu%24%60%20for%20item%20%60%24i%24%60%20%20%0A%20%0A%0A%23%23%234.%20%20Computing%20Similarities%20%20%0A%20%20%20%20%u5982%u4F55%u9009%u53D6user%u7684%u8FD1%u90BBusers%3F%20%20%0A%20%20%20%204.1%20%u4F7F%u7528user%u7684rating%u4F5C%u4E3A%u7279%u5F81%2C%u6309cosine%u6C42%u8DDD%u79BB%2C%u9009%u62E9Top-*30*%20%20%0A%20%20%20%204.2%20%u4F7F%u7528Pearson%20correction%20coefficient%20%u5B9A%u4E49%u5982%u4E0B%3A%0A%20%20%20%204.3%20%u4F7F%u7528%u805A%u7C7B%u65B9%u6CD5%u5BFB%u627E%u8FD1%u90BBusers%0A%20%20%20%20%20%20%20%20%u7136%u540Epick%20user%27s%20cluster%20to%20generate%20predictions.%0A%0A%20%20%20%20%u672C%u8282%u5B9E%u9A8C%u90E8%u5206%u91C7%u75284.1%u7684%u65B9%u6CD5.%20%20%0A%20%20%20%20%0A----%0A%23%23%u672C%u8282%u8BFE%u7A0B%u6750%u6599%0A%5Bvedio%5D%28http%3A//pan.baidu.com/s/1c09rqt6%29%20%20%0A%5Bslides%5D%28http%3A//pan.baidu.com/s/1dDyzSMt%29%20%20%0A%5Bcode%5D%28http%3A//pan.baidu.com/s/1tMsGm%29%0A%20%20%20%20%0A---%0A%23%23%u63A8%u8350%u9605%u8BFB%3A%0A1.%20Herlocker%20J%20L%2C%20Konstan%20J%20A%2C%20Borchers%20A%2C%20et%20al.%20An%20algorithmic%20framework%20for%20performing%20collaborative%20filtering%5BC%5D//Proceedings%20of%20the%2022nd%20annual%20international%20ACM%20SIGIR%20conference%20on%20Research%20and%20development%20in%20information%20retrieval.%20ACM%2C%201999%3A%20230-237.%0A2.%20%5BHerlocker%20J%20L%2C%20Konstan%20J%20A%2C%20Riedl%20J.%20Explaining%20collaborative%20filtering%20recommendations%5BC%5D//Proceedings%20of%20the%202000%20ACM%20conference%20on%20Computer%20supported%20cooperative%20work.%20ACM%2C%202000%3A%20241-250.%5D%28http%3A//svn.tribler.org/abc/branches/leo/dataset/preferences/johan/johan-59.pdf%29%0A


comments powered by Disqus