Title: | Lab for Developing and Testing Recommender Algorithms |
---|---|
Description: | Provides a research infrastructure to develop and evaluate collaborative filtering recommender algorithms. This includes a sparse representation for user-item matrices, many popular algorithms, top-N recommendations, and cross-validation. Hahsler (2022) <doi:10.48550/arXiv.2205.12371>. |
Authors: | Michael Hahsler [aut, cre, cph] , Bregt Vereet [ctb] |
Maintainer: | Michael Hahsler <[email protected]> |
License: | GPL-2 |
Version: | 1.0.6 |
Built: | 2024-10-26 05:26:50 UTC |
Source: | https://github.com/mhahsler/recommenderlab |
A matrix to represent binary rating data. 1 codes for a positive rating and 0 codes for either no or a negative rating. This coding is common for market basked data where products are either bought or not.
Objects can be created by calls of the form new("binaryRatingMatrix", data = im)
, where im
is an itemMatrix
as defined in package
arules, by coercion from a matrix (all non-zero values will be a 1),
or by using binarize
for
an object of class "realRatingMatrix".
data
:Object of class "itemMatrix"
(see package arules)
Class "ratingMatrix"
, directly.
signature(from = "matrix", to = "binaryRatingMatrix")
:
The matrix needs to be a logical matrix, or a 0-1 matrix (0 means FALSE and 1 means TRUE).
NAs are interpreted as FALSE.
signature(from = "itemMatrix", to = "binaryRatingMatrix")
signature(from = "data.frame", to = "binaryRatingMatrix")
signature(from = "binaryRatingMatrix", to = "matrix")
signature(from = "binaryRatingMatrix", to = "dgTMatrix")
signature(from = "binaryRatingMatrix", to = "ngCMatrix")
signature(from = "binaryRatingMatrix", to = "dgCMatrix")
signature(from = "binaryRatingMatrix", to = "itemMatrix")
signature(from = "binaryRatingMatrix", to = "list")
itemMatrix
in arules,
getList
.
## create a 0-1 matrix m <- matrix(sample(c(0,1), 50, replace=TRUE), nrow=5, ncol=10, dimnames=list(users=paste("u", 1:5, sep=''), items=paste("i", 1:10, sep=''))) m ## coerce it into a binaryRatingMatrix b <- as(m, "binaryRatingMatrix") b ## coerce it back to see if it worked as(b, "matrix") ## use some methods defined in ratingMatrix dim(b) dimnames(b) ## counts rowCounts(b) ## number of ratings per user colCounts(b) ## number of ratings per item ## plot image(b) ## sample and subset sample(b,2) b[1:2,1:5] ## coercion as(b, "list") head(as(b, "data.frame")) head(getData.frame(b, ratings=FALSE)) ## creation from user/item tuples df <- data.frame(user=c(1,1,2,2,2,3), items=c(1,4,1,2,3,5)) df b2 <- as(df, "binaryRatingMatrix") b2 as(b2, "matrix")
## create a 0-1 matrix m <- matrix(sample(c(0,1), 50, replace=TRUE), nrow=5, ncol=10, dimnames=list(users=paste("u", 1:5, sep=''), items=paste("i", 1:10, sep=''))) m ## coerce it into a binaryRatingMatrix b <- as(m, "binaryRatingMatrix") b ## coerce it back to see if it worked as(b, "matrix") ## use some methods defined in ratingMatrix dim(b) dimnames(b) ## counts rowCounts(b) ## number of ratings per user colCounts(b) ## number of ratings per item ## plot image(b) ## sample and subset sample(b,2) b[1:2,1:5] ## coercion as(b, "list") head(as(b, "data.frame")) head(getData.frame(b, ratings=FALSE)) ## creation from user/item tuples df <- data.frame(user=c(1,1,2,2,2,3), items=c(1,4,1,2,3,5)) df b2 <- as(df, "binaryRatingMatrix") b2 as(b2, "matrix")
Calculate prediction accuracy. For predicted ratings MAE (mean average error), MSE (means squared error) and RMSE (root means squared error) are calculated. For topNLists various binary classification metrics are returned (e.g., precision, recall, TPR, FPR).
calcPredictionAccuracy(x, data, ...) ## S4 method for signature 'realRatingMatrix,realRatingMatrix' calcPredictionAccuracy(x, data, byUser = FALSE, ...) ## S4 method for signature 'topNList,realRatingMatrix' calcPredictionAccuracy(x, data, byUser = FALSE, given = NULL, goodRating = NA, ...) ## S4 method for signature 'topNList,binaryRatingMatrix' calcPredictionAccuracy(x, data, byUser = FALSE, given = NULL, ...)
calcPredictionAccuracy(x, data, ...) ## S4 method for signature 'realRatingMatrix,realRatingMatrix' calcPredictionAccuracy(x, data, byUser = FALSE, ...) ## S4 method for signature 'topNList,realRatingMatrix' calcPredictionAccuracy(x, data, byUser = FALSE, given = NULL, goodRating = NA, ...) ## S4 method for signature 'topNList,binaryRatingMatrix' calcPredictionAccuracy(x, data, byUser = FALSE, given = NULL, ...)
x |
Predicted items in a "topNList" or predicted ratings as a "realRatingMatrix" |
data |
Observed true ratings for the users as a "RatingMatrix". The users have to be in the same order as in |
byUser |
logical; Should the accuracy measures be reported for each user individually instead of being averaged over all users? |
given |
how many items were given to create the predictions. If the data comes from an evaluation scheme that usses all-but-x (i.e., a negative value for |
goodRating |
If |
... |
further arguments. |
The function calculates the accuracy of predictions compared to the observed true ratings (data
) averaged over the users. Use byUser = TRUE
to get the results for each user.
If both, the predictions are numeric ratings (i.e. a "realRatingMatrix"), then the error measures RMSE, MSE and MAE are calculated.
If the predictions are a "topNList", then the entries of the confusion matrix (true positives TP, false positives FP, false negatives FN and true negatives TN) and binary classification measures like precision, recall, TPR and FPR are calculated. If data is a "realRatingMatrix", then
goodRating
has to be specified to identify items that should be recommended (i.e., have a rating of goodRating or more).
Note that you need to specify the number of items given to the recommender to create predictions.
The number of predictions by user (N) is the total number of items in the data minus the number of given items. The number of TP is limited by the size of the top-N list. Also, since the counts for TP, FP, FN and TN are averaged over the users (unless byUser = TRUE
is used),
they will not be whole numbers.
If the ratings are a "topNList" and the observed data is a "realRatingMatrix" then goodRating
is used
to determine what rating in data
is considered a good rating for calculating binary classification measures. This means that an item in the topNList is considered a true positive if it has a rating of goodRating
or better in the observed data.
Returns a vector with the appropriate measures averaged over all users.
For byUser=TRUE
, a matrix with a row for each user is returned.
Asela Gunawardana and Guy Shani (2009). A Survey of Accuracy Evaluation Metrics of Recommendation Tasks, Journal of Machine Learning Research 10, 2935-2962.
topNList
,
binaryRatingMatrix
,
realRatingMatrix
.
### recommender for real-valued ratings data(Jester5k) ## create 90/10 split (known/unknown) for the first 500 users in Jester5k e <- evaluationScheme(Jester5k[1:500, ], method = "split", train = 0.9, k = 1, given = 15) e ## create a user-based CF recommender using training data r <- Recommender(getData(e, "train"), "UBCF") ## create predictions for the test data using known ratings (see given above) p <- predict(r, getData(e, "known"), type = "ratings") p ## compute error metrics averaged per user and then averaged over all ## recommendations calcPredictionAccuracy(p, getData(e, "unknown")) head(calcPredictionAccuracy(p, getData(e, "unknown"), byUser = TRUE)) ## evaluate topNLists instead (you need to specify given and goodRating!) p <- predict(r, getData(e, "known"), type = "topNList") p calcPredictionAccuracy(p, getData(e, "unknown"), given = 15, goodRating = 5) ## evaluate a binary recommender data(MSWeb) MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 50) e <- evaluationScheme(MSWeb10, method="split", train = 0.9, k = 1, given = 3) e ## create a user-based CF recommender using training data r <- Recommender(getData(e, "train"), "UBCF") ## create predictions for the test data using known ratings (see given above) p <- predict(r, getData(e, "known"), type="topNList", n = 10) p calcPredictionAccuracy(p, getData(e, "unknown"), given = 3) calcPredictionAccuracy(p, getData(e, "unknown"), given = 3, byUser = TRUE)
### recommender for real-valued ratings data(Jester5k) ## create 90/10 split (known/unknown) for the first 500 users in Jester5k e <- evaluationScheme(Jester5k[1:500, ], method = "split", train = 0.9, k = 1, given = 15) e ## create a user-based CF recommender using training data r <- Recommender(getData(e, "train"), "UBCF") ## create predictions for the test data using known ratings (see given above) p <- predict(r, getData(e, "known"), type = "ratings") p ## compute error metrics averaged per user and then averaged over all ## recommendations calcPredictionAccuracy(p, getData(e, "unknown")) head(calcPredictionAccuracy(p, getData(e, "unknown"), byUser = TRUE)) ## evaluate topNLists instead (you need to specify given and goodRating!) p <- predict(r, getData(e, "known"), type = "topNList") p calcPredictionAccuracy(p, getData(e, "unknown"), given = 15, goodRating = 5) ## evaluate a binary recommender data(MSWeb) MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 50) e <- evaluationScheme(MSWeb10, method="split", train = 0.9, k = 1, given = 3) e ## create a user-based CF recommender using training data r <- Recommender(getData(e, "train"), "UBCF") ## create predictions for the test data using known ratings (see given above) p <- predict(r, getData(e, "known"), type="topNList", n = 10) p calcPredictionAccuracy(p, getData(e, "unknown"), given = 3) calcPredictionAccuracy(p, getData(e, "unknown"), given = 3, byUser = TRUE)
Calculate dissimilarities/similarities between ratings by users and for items.
## S4 method for signature 'binaryRatingMatrix' dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users") ## S4 method for signature 'realRatingMatrix' dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users") similarity(x, y = NULL, method = NULL, args = NULL, ...) ## S4 method for signature 'ratingMatrix' similarity(x, y = NULL, method = NULL, args = NULL, which = "users", min_matching = 0, min_predictive = 0)
## S4 method for signature 'binaryRatingMatrix' dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users") ## S4 method for signature 'realRatingMatrix' dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users") similarity(x, y = NULL, method = NULL, args = NULL, ...) ## S4 method for signature 'ratingMatrix' similarity(x, y = NULL, method = NULL, args = NULL, which = "users", min_matching = 0, min_predictive = 0)
x |
a ratingMatrix. |
y |
|
method |
(dis)similarity measure to use. Available measures
are typically |
args |
a list of additional arguments for the methods. |
which |
a character string indicating if the (dis)similarity should be
calculated between |
min_matching , min_predictive
|
Thresholds on the minimum number of ratings used to calculate the similarity and the minimum number of ratings that can be used for prediction. |
... |
further arguments. |
Most dissimlarites and similarities are calculated using the proxy package.
Similarities are typically converted into dissimilarities using
or
(used for Jaccard, Cosine and Pearson correlation) depending on the measure.
Similarities are usually defined in the range of , however,
Cosine similarity and Pearson correlation are defined in the interval
. We rescale these
measures with
to the interval
.
Similarities are calculated using only the ratings that are available for both
users/items. This can lead to calculating the measure using only a very small number (maybe only one)
of ratings. min_matching
is the required number of shared ratings to calculate similarities.
To predict ratings, there need to be additional ratings in argument y
.
min_predictive
is the required number of additional ratings to calculate similarities. If
min_matching
or min_predictive
fails, then NA
is reported instead of the calculated similarity.
returns an object of class "dist"
, "simil"
or an appropriate object (e.g.,
a matrix with class "crossdist"
o "crosssimil"
) to represent
a cross-(dis)similarity.
ratingMatrix
,
dissimilarity
in arules, and
dist
in proxy.
data(MSWeb) ## between 5 users dissimilarity(MSWeb[1:5,], method = "jaccard") similarity(MSWeb[1:5,], method = "jaccard") ## between first 3 items dissimilarity(MSWeb[,1:3], method = "jaccard", which = "items") similarity(MSWeb[,1:3], method = "jaccard", which = "items") ## cross-similarity between first 2 users and users 10-20 similarity(MSWeb[1:2,], MSWeb[10:20,], method="jaccard")
data(MSWeb) ## between 5 users dissimilarity(MSWeb[1:5,], method = "jaccard") similarity(MSWeb[1:5,], method = "jaccard") ## between first 3 items dissimilarity(MSWeb[,1:3], method = "jaccard", which = "items") similarity(MSWeb[,1:3], method = "jaccard", which = "items") ## cross-similarity between first 2 users and users 10-20 similarity(MSWeb[1:2,], MSWeb[10:20,], method="jaccard")
Calculate the mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE) and for matrices also the Frobenius norm (identical to RMSE).
MSE(true, predicted, na.rm = TRUE) RMSE(true, predicted, na.rm = TRUE) MAE(true, predicted, na.rm = TRUE) frobenius(true, predicted, na.rm = TRUE)
MSE(true, predicted, na.rm = TRUE) RMSE(true, predicted, na.rm = TRUE) MAE(true, predicted, na.rm = TRUE) frobenius(true, predicted, na.rm = TRUE)
true |
true values. |
predicted |
predicted values |
na.rm |
ignore missing values. |
Frobenius norm requires matrices.
The error value.
true <- rnorm(10) predicted <- rnorm(10) MAE(true, predicted) MSE(true, predicted) RMSE(true, predicted) true <- matrix(rnorm(9), nrow = 3) predicted <- matrix(rnorm(9), nrow = 3) frobenius(true, predicted)
true <- rnorm(10) predicted <- rnorm(10) MAE(true, predicted) MSE(true, predicted) RMSE(true, predicted) true <- matrix(rnorm(9), nrow = 3) predicted <- matrix(rnorm(9), nrow = 3) frobenius(true, predicted)
Evaluates a single or a list of recommender model given an evaluation scheme and return evaluation metrics.
evaluate(x, method, ...) ## S4 method for signature 'evaluationScheme,character' evaluate(x, method, type="topNList", n=1:10, parameter=NULL, progress = TRUE, keepModel=FALSE) ## S4 method for signature 'evaluationScheme,list' evaluate(x, method, type="topNList", n=1:10, parameter=NULL, progress = TRUE, keepModel=FALSE)
evaluate(x, method, ...) ## S4 method for signature 'evaluationScheme,character' evaluate(x, method, type="topNList", n=1:10, parameter=NULL, progress = TRUE, keepModel=FALSE) ## S4 method for signature 'evaluationScheme,list' evaluate(x, method, type="topNList", n=1:10, parameter=NULL, progress = TRUE, keepModel=FALSE)
x |
an evaluation scheme (class |
method |
a character string or a list. If
a single character string is given it defines the recommender method
used for evaluation. If several recommender methods need to be compared,
|
type |
evaluate "topNList" or "ratings"? |
n |
a vector of the different values for N used to generate top-N lists (only if type="topNList"). |
parameter |
a list with parameters for the recommender algorithm (only
used when |
progress |
logical; report progress? |
keepModel |
logical; store used recommender models? |
... |
further arguments. |
The evaluation uses the specification in the evaluation scheme to train a recommender models on training data and then evaluates the models on test data.
The result is a set of accuracy measures averaged over the test users.
See calcPredictionAccuracy
for details on the accuracy measures and the averaging.
Note: Also the confusion matrix counts are averaged over users and therefore not whole numbers.
See vignette("recommenderlab")
for more details on the evaluaiton process and the used metrics.
If a single recommender method is specified in method
, then an
object of class "evaluationResults"
is returned.
If method
is a list of recommendation models, then an object of class "evaluationResultList"
is returned.
calcPredictionAccuracy
,
evaluationScheme
,
evaluationResults
.
evaluationResultList
.
### evaluate top-N list recommendations on a 0-1 data set ## Note: we sample only 100 users to make the example run faster data("MSWeb") MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 100) ## create an evaluation scheme (10-fold cross validation, given-3 scheme) es <- evaluationScheme(MSWeb10, method="cross-validation", k=10, given=3) ## run evaluation ev <- evaluate(es, "POPULAR", n=c(1,3,5,10)) ev ## look at the results (the length of the topNList is shown as column n) getResults(ev) ## get a confusion matrices averaged over the 10 folds avg(ev) plot(ev, annotate = TRUE) ## evaluate several algorithms (including a hybrid recommender) with a list algorithms <- list( RANDOM = list(name = "RANDOM", param = NULL), POPULAR = list(name = "POPULAR", param = NULL), HYBRID = list(name = "HYBRID", param = list(recommenders = list( RANDOM = list(name = "RANDOM", param = NULL), POPULAR = list(name = "POPULAR", param = NULL) ) ) ) ) evlist <- evaluate(es, algorithms, n=c(1,3,5,10)) evlist names(evlist) ## select the first results by index evlist[[1]] avg(evlist[[1]]) plot(evlist, legend="topright") ### Evaluate using a data set with real-valued ratings ## Note: we sample only 100 users to make the example run faster data("Jester5k") es <- evaluationScheme(Jester5k[1:100], method="split", train=.9, given=10, goodRating=5) ## Note: goodRating is used to determine positive ratings ## predict top-N recommendation lists ## (results in TPR/FPR and precision/recall) ev <- evaluate(es, "RANDOM", type="topNList", n=10) getResults(ev) ## predict missing ratings ## (results in RMSE, MSE and MAE) ev <- evaluate(es, "RANDOM", type="ratings") getResults(ev)
### evaluate top-N list recommendations on a 0-1 data set ## Note: we sample only 100 users to make the example run faster data("MSWeb") MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 100) ## create an evaluation scheme (10-fold cross validation, given-3 scheme) es <- evaluationScheme(MSWeb10, method="cross-validation", k=10, given=3) ## run evaluation ev <- evaluate(es, "POPULAR", n=c(1,3,5,10)) ev ## look at the results (the length of the topNList is shown as column n) getResults(ev) ## get a confusion matrices averaged over the 10 folds avg(ev) plot(ev, annotate = TRUE) ## evaluate several algorithms (including a hybrid recommender) with a list algorithms <- list( RANDOM = list(name = "RANDOM", param = NULL), POPULAR = list(name = "POPULAR", param = NULL), HYBRID = list(name = "HYBRID", param = list(recommenders = list( RANDOM = list(name = "RANDOM", param = NULL), POPULAR = list(name = "POPULAR", param = NULL) ) ) ) ) evlist <- evaluate(es, algorithms, n=c(1,3,5,10)) evlist names(evlist) ## select the first results by index evlist[[1]] avg(evlist[[1]]) plot(evlist, legend="topright") ### Evaluate using a data set with real-valued ratings ## Note: we sample only 100 users to make the example run faster data("Jester5k") es <- evaluationScheme(Jester5k[1:100], method="split", train=.9, given=10, goodRating=5) ## Note: goodRating is used to determine positive ratings ## predict top-N recommendation lists ## (results in TPR/FPR and precision/recall) ev <- evaluate(es, "RANDOM", type="topNList", n=10) getResults(ev) ## predict missing ratings ## (results in RMSE, MSE and MAE) ev <- evaluate(es, "RANDOM", type="ratings") getResults(ev)
Contains the evaluation results for several runs using multiple recommender methods in form of confusion matrices. For each run the used models might be avialable.
Objects are created by evaluate
.
.Data
:Object of class "list"
: a list of
"evaluationResults"
.
Class "list"
, from data part.
signature(x = "evaluationResultList")
: returns a
list of average confusion matrices.
signature(x = "evaluationResultList", i = "ANY", j = "missing", drop = "missing")
signature(from = "list", to = "evaluationResultList")
signature(object = "evaluationResultList")
Contains the evaluation results for several runs using the same recommender method in form of confusion matrices. For each run the used model might be avialable.
Objects are created by evaluate
.
results
:Object of class "list"
: contains
objects of class "ConfusionMatrix"
, one for each run specified
in the used evaluation scheme.
signature(x = "evaluationResults")
: returns the evaluation metrics averaged of cross-validation folds.
signature(x = "evaluationResults")
:
Deprecated. Use getResults()
.
signature(x = "evaluationResults")
:
returns a list of evaluation metrics with one element for each cross-valudation fold.
signature(x = "evaluationResults")
: returns a
list of used recommender models (if avilable).
signature(x = "evaluationResults")
: returns
the number of runs/number of confusion matrices.
signature(object = "evaluationResults")
Creates an evaluationScheme object from a data set. The scheme can be a simple split into training and test data, k-fold cross-evaluation or using k independent bootstrap samples.
evaluationScheme(data, ...) ## S4 method for signature 'ratingMatrix' evaluationScheme(data, method="split", train=0.9, k=NULL, given, goodRating = NA)
evaluationScheme(data, ...) ## S4 method for signature 'ratingMatrix' evaluationScheme(data, method="split", train=0.9, k=NULL, given, goodRating = NA)
data |
data set as a ratingMatrix. |
method |
a character string defining the evaluation method to use (see details). |
train |
fraction of the data set used for training. |
k |
number of folds/times to run the evaluation (defaults to 10 for cross-validation and bootstrap and 1 for split). |
given |
single number of items given for evaluation or
a vector of length of data giving the number of items given for each
observation. Negative values implement all-but schemes. For example,
|
goodRating |
numeric; threshold at which ratings are considered
good for evaluation. E.g., with |
... |
further arguments. |
evaluationScheme
creates an evaluation scheme (training and test data)
with k
runs and one of the following methods:
"split"
randomly assigns
the proportion of objects specified by train
to the training set and
the rest is used for the test set.
"cross-validation"
creates a k-fold cross-validation scheme. The data
is randomly split into k parts and in each run k-1 parts are used for
training and the remaining part is used for testing. After all k runs each
part was used as the test set exactly once.
"bootstrap"
creates the training set by taking a bootstrap sample
(sampling with replacement) of size train
times number of users in
the data set.
All objects not in the training set are used for testing.
For evaluation, Breese et al. (1998) introduced the
four experimental protocols called Given 2, Given 5, Given 10 and All-but-1.
During testing, the Given x protocol presents the algorithm with
only x randomly chosen items for the test user, and the algorithm
is evaluated by how well it is able to predict the withheld items.
For All-but-x,
the algorithm sees all but
x withheld ratings for the test user.
given
controls x in the evaluations scheme.
Positive integers result in a Given x protocol, while negative values
produce a All-but-x protocol.
If a user does not have enough ratings to satisfy given
, then the user is dropped from the
evaluation with a warning.
Returns an object of class "evaluationScheme"
.
Kohavi, Ron (1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection". Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137-1143.
Breese JS, Heckerman D, Kadie C (1998). "Empirical Analysis of Predictive Algorithms for Collaborative Filtering." In Uncertainty in Artificial Intelligence. Proceedings of the Fourteenth Conference, pp. 43-52.
getData
,
evaluationScheme
,
ratingMatrix
.
data("MSWeb") MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 50) MSWeb10 ## simple split with 3 items given esSplit <- evaluationScheme(MSWeb10, method="split", train = 0.9, k=1, given=3) esSplit ## 4-fold cross-validation with all-but-1 items for learning. esCross <- evaluationScheme(MSWeb10, method="cross-validation", k=4, given=-1) esCross
data("MSWeb") MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 50) MSWeb10 ## simple split with 3 items given esSplit <- evaluationScheme(MSWeb10, method="split", train = 0.9, k=1, given=3) esSplit ## 4-fold cross-validation with all-but-1 items for learning. esCross <- evaluationScheme(MSWeb10, method="cross-validation", k=4, given=-1) esCross
An evaluation scheme created from a data set. The scheme can be a simple split into training and test data, k-fold cross-evaluation or using k bootstrap samples.
Objects can be created by
evaluationScheme(data, method="split", train=0.9, k=NULL, given=3).
data
:Object of class "ratingMatrix"
; the data set.
given
:Object of class "integer"
; given ratings are
randomly selected for each evaluation user and
presented to the recommender
algorithm to calculate recommend items/ratings.
The recommended items are compared
to the remaining items for the evaluation user.
goodRating
:Object of class "numeric"
; Rating at which an item is considered a positive for evaluation.
k
:Object of class "integer"
; number of runs for evaluation. Default is 1 for method "split" and 10 for "cross-validation" and "bootstrap".
knownData
:Object of class "ratingMatrix"
; data set with only known (given) items.
method
:Object of class "character"
; evaluation method. Available methods are: "split", "cross-validation" and "bootstrap".
runsTrain
:Object of class "list"
; internal repesentation for the split in training and test data for the evaluation runs.
train
:Object of class "numeric"
; portion of data used for training for "split" and "bootstrap".
unknownData
:Object of class "ratingMatrix"
; data set with only unknown items.
signature(x = "evaluationScheme")
: access data.
Parameters are type
("train", "known" or "unknown", "given") and
run
(1...k).
"train"
returns the training data for the run,
"known"
returns the known ratings used for prediction
for the test data,
"unknown"
returns the ratings used for evaluation
for the test data, and
"given"
returns the number of items that were given in "known." If the given
items
was a positive number, then this will be a vector with this number, but if given
was negative (all-but-x),
then the number of given items for each test user will be different.
signature(object = "evaluationScheme")
ratingMatrix
and
the creator function evaluationScheme
.
Implements matrix decomposition by the stochastic gradient descent optimization popularized by Simon Funk to minimize the error on the known values.
This function is used by the recommender method "SVDF" (see Recommender
).
funkSVD(x, k = 10, gamma = 0.015, lambda = 0.001, min_improvement = 1e-06, min_epochs = 50, max_epochs = 200, verbose = FALSE)
funkSVD(x, k = 10, gamma = 0.015, lambda = 0.001, min_improvement = 1e-06, min_epochs = 50, max_epochs = 200, verbose = FALSE)
x |
a matrix, potentially containing NAs. |
k |
number of features (i.e, rank of the approximation). |
gamma |
regularization term. |
lambda |
learning rate. |
min_improvement |
required minimum improvement per iteration. |
min_epochs |
minimum number of iterations per feature. |
max_epochs |
maximum number of iterations per feature. |
verbose |
show progress. |
Funk SVD decomposes a matrix (with missing values)
into two components and
.
The singular values are folded into these matrices.
The approximation
for the original matrix can be obtained by
.
This function predict
in this implementation folds in new data rows
by estimating the vectors using gradient descend and then calculating
the reconstructed complete matrix r for these users via
.
An object of class "funkSVD"
with components
U |
the |
V |
the |
parameters |
a list with parameter values. |
The code is based on the implmentation in package rrecsys by Ludovik Coba and Markus Zanker.
Y. Koren, R. Bell, and C. Volinsky. Matrix Factorization Techniques for Recommender Systems, IEEE Computer, pp. 42-49, August 2009.
# this takes a while to run! ## Not run: data("Jester5k") # helper to calculate root mean squared error rmse <- function(pred, truth) sqrt(sum((truth-pred)^2, na.rm = TRUE)) train <- as(Jester5k[1:100], "matrix") fsvd <- funkSVD(train, verbose = TRUE) # reconstruct the original rating matrix as R = UV' r <- tcrossprod(fsvd$U, fsvd$V) rmse(train, r) # fold in new users for matrix completion test <- as(Jester5k[101:105], "matrix") p <- predict(fsvd, test, verbose = TRUE) rmse(test, p) ## End(Not run)
# this takes a while to run! ## Not run: data("Jester5k") # helper to calculate root mean squared error rmse <- function(pred, truth) sqrt(sum((truth-pred)^2, na.rm = TRUE)) train <- as(Jester5k[1:100], "matrix") fsvd <- funkSVD(train, verbose = TRUE) # reconstruct the original rating matrix as R = UV' r <- tcrossprod(fsvd$U, fsvd$V) rmse(train, r) # fold in new users for matrix completion test <- as(Jester5k[101:105], "matrix") p <- predict(fsvd, test, verbose = TRUE) rmse(test, p) ## End(Not run)
Create a list or data.frame representation for various objects
used in recommenderlab. These functions are used in addition to
available coercion to allow for parameters like decode
.
getList(from, ...) ## S4 method for signature 'realRatingMatrix' getList(from, decode = TRUE, ratings = TRUE, ...) ## S4 method for signature 'binaryRatingMatrix' getList(from, decode = TRUE, ...) ## S4 method for signature 'topNList' getList(from, decode = TRUE, ...) getData.frame(from, ...) ## S4 method for signature 'ratingMatrix' getData.frame(from, decode = TRUE, ratings = TRUE, ...)
getList(from, ...) ## S4 method for signature 'realRatingMatrix' getList(from, decode = TRUE, ratings = TRUE, ...) ## S4 method for signature 'binaryRatingMatrix' getList(from, decode = TRUE, ...) ## S4 method for signature 'topNList' getList(from, decode = TRUE, ...) getData.frame(from, ...) ## S4 method for signature 'ratingMatrix' getData.frame(from, decode = TRUE, ratings = TRUE, ...)
from |
object to be represented as a list. |
decode |
use item names or item IDs (column numbers) for items? |
ratings |
include ratings in the list or data.frame? |
... |
further arguments (currently unused). |
Lists have one vector with items (and ratings) per user. The data.frame has one row per rating with the user in the first column, the item as the second and the rating as the third.
Returns a list or a data.frame.
binaryRatingMatrix
,
realRatingMatrix
,
topNList
.
data(Jester5k) getList(Jester5k[1,]) getData.frame(Jester5k[1,])
data(Jester5k) getList(Jester5k[1,]) getData.frame(Jester5k[1,])
Creates and combines recommendations using several recommender algorithms.
HybridRecommender(..., weights = NULL, aggregation_type = "sum")
HybridRecommender(..., weights = NULL, aggregation_type = "sum")
... |
objects of class 'Recommender'. |
weights |
weights for the recommenders. The recommenders are equally weighted by default. |
aggregation_type |
How are the recommendations aggregated. Options are "sum", "min", and "max". |
The hybrid recommender is initialized with a set of pretrained Recommender objects. Typically, the algorithms are trained using the same training set. If different training sets are used, then, at least the training sets need to have the same items in the same order.
Alternatively, hybrid recommenders can be created using the regular Recommender()
interface. Here method
is set to HYBRID
and parameter
contains
a list with recommenders and weights. recommenders are a list of recommender alorithms,
where each algorithms is represented as a list with elements name (method of the recommender)
and parameters (the algorithms parameters). This method can be used in evaluate()
For creating recommendations (predict
), each recommender algorithm
is used to create ratings. The individual ratings are combined using
a weighted sum where missing ratings are ignored. Weights can be specified in weights
.
An object of class 'Recommender'.
data("MovieLense") MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,] train <- MovieLense100[1:100] test <- MovieLense100[101:103] ## mix popular movies with a random recommendations for diversity and ## rerecommend some movies the user liked. recom <- HybridRecommender( Recommender(train, method = "POPULAR"), Recommender(train, method = "RANDOM"), Recommender(train, method = "RERECOMMEND"), weights = c(.6, .1, .3) ) recom getModel(recom) as(predict(recom, test), "list") ## create a hybrid recommender using the regular Recommender interface. ## This is needed to use hybrid recommenders with evaluate(). recommenders <- list( RANDOM = list(name = "POPULAR", param = NULL), POPULAR = list(name = "RANDOM", param = NULL), RERECOMMEND = list(name = "RERECOMMEND", param = NULL) ) weights <- c(.6, .1, .3) recom <- Recommender(train, method = "HYBRID", parameter = list(recommenders = recommenders, weights = weights)) recom as(predict(recom, test), "list")
data("MovieLense") MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,] train <- MovieLense100[1:100] test <- MovieLense100[101:103] ## mix popular movies with a random recommendations for diversity and ## rerecommend some movies the user liked. recom <- HybridRecommender( Recommender(train, method = "POPULAR"), Recommender(train, method = "RANDOM"), Recommender(train, method = "RERECOMMEND"), weights = c(.6, .1, .3) ) recom getModel(recom) as(predict(recom, test), "list") ## create a hybrid recommender using the regular Recommender interface. ## This is needed to use hybrid recommenders with evaluate(). recommenders <- list( RANDOM = list(name = "POPULAR", param = NULL), POPULAR = list(name = "RANDOM", param = NULL), RERECOMMEND = list(name = "RERECOMMEND", param = NULL) ) weights <- c(.6, .1, .3) recom <- Recommender(train, method = "HYBRID", parameter = list(recommenders = recommenders, weights = weights)) recom as(predict(recom, test), "list")
Utility functions used internally by recommender algorithms. See files starting
with RECOM
in the package's R
directory for examples of usage.
returnRatings(ratings, newdata, type = c("topNList", "ratings", "ratingMatrix"), n, randomize = NULL, minRating = NA) getParameters(defaults, parameter)
returnRatings(ratings, newdata, type = c("topNList", "ratings", "ratingMatrix"), n, randomize = NULL, minRating = NA) getParameters(defaults, parameter)
ratings |
a realRatingMatrix. |
newdata |
a realRatingMatrix. |
type |
type of recommendation to return. |
n |
max. number of entries in the top-N list. |
randomize |
randomization factor for producing the top-N list. |
minRating |
do not include ratings less than this. |
defaults |
list with parameters and default values. |
parameter |
list with actual parameters. |
returnRatings
is used in the predict function of recommender algorithms
to return different types of recommendations.
getParameters
is a helper function which checks parameters for
consistency and provides default values. Used in the Recommender constructor.
The data set contains a sample of 5000 users from the anonymous ratings data from the Jester Online Joke Recommender System collected between April 1999 and May 2003.
data(Jester5k)
data(Jester5k)
The format of Jester5k
is: Formal class 'realRatingMatrix' [package "recommenderlab"]
The format of JesterJokes
is: vector of character strings.
Jester5k
contains a 5000 x 100 rating matrix (5000 users and 100 jokes)
with ratings between -10.00 and +10.00. All selected users have
rated 36 or more jokes.
The data also contains the actual jokes in JesterJokes
.
Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. "Eigentaste: A Constant Time Collaborative Filtering Algorithm." Information Retrieval, 4(2), 133-151. July 2001.
data(Jester5k) Jester5k ## number of ratings nratings(Jester5k) ## number of ratings per user summary(rowCounts(Jester5k)) ## rating distribution hist(getRatings(Jester5k), main="Distribution of ratings") ## 'best' joke with highest average rating best <- which.max(colMeans(Jester5k)) cat(JesterJokes[best])
data(Jester5k) Jester5k ## number of ratings nratings(Jester5k) ## number of ratings per user summary(rowCounts(Jester5k)) ## rating distribution hist(getRatings(Jester5k), main="Distribution of ratings") ## 'best' joke with highest average rating best <- which.max(colMeans(Jester5k)) cat(JesterJokes[best])
The 100k MovieLense
ratings data set. The data was collected through the MovieLens web site
(movielens.umn.edu) during the seven-month period from September 19th,
1997 through April 22nd, 1998.
The data set contains about 100,000 ratings (1-5)
from 943 users on 1664 movies. Movie and user metadata is also provided in MovieLenseMeta
and MovieLenseUser
.
data(MovieLense)
data(MovieLense)
The format of MovieLense
is an object of class "realRatingMatrix"
The format of MovieLenseMeta
is a data.frame with movie title, year, IMDb URL and indicator variables for 19 genres.
The format of MovieLenseUser
is a data.frame with user age, sex, occupation and zip code.
GroupLens Research, https://grouplens.org/datasets/movielens/
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.. An Algorithmic Framework for Performing Collaborative Filtering. Proceedings of the 1999 Conference on Research and Development in Information Retrieval. Aug. 1999.
data(MovieLense) MovieLense ## look at the first few ratings of the first user head(as(MovieLense[1,], "list")[[1]]) ## visualize part of the matrix image(MovieLense[1:100,1:100]) ## number of ratings per user hist(rowCounts(MovieLense)) ## number of ratings per movie hist(colCounts(MovieLense)) ## mean rating (averaged over users) mean(rowMeans(MovieLense)) ## available movie meta information head(MovieLenseMeta) ## available user meta information head(MovieLenseUser)
data(MovieLense) MovieLense ## look at the first few ratings of the first user head(as(MovieLense[1,], "list")[[1]]) ## visualize part of the matrix image(MovieLense[1:100,1:100]) ## number of ratings per user hist(rowCounts(MovieLense)) ## number of ratings per movie hist(colCounts(MovieLense)) ## mean rating (averaged over users) mean(rowMeans(MovieLense)) ## available movie meta information head(MovieLenseMeta) ## available user meta information head(MovieLenseUser)
Vroots visited by users in a one week timeframe.
data(MSWeb)
data(MSWeb)
The format is: Formal class "binaryRatingMatrix"
.
The data was created by sampling and processing the www.microsoft.com logs. The data records the use of www.microsoft.com by 38000 anonymous, randomly-selected users. For each user, the data lists all the areas of the web site (Vroots) that user visited in a one week timeframe in February 1998.
This dataset contains 32710 valid users and 285 Vroots.
Asuncion, A., Newman, D.J. (2007). UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science. https://archive.ics.uci.edu/
J. Breese, D. Heckerman., C. Kadie (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI.
data(MSWeb) MSWeb nratings(MSWeb) ## look at first two users as(MSWeb[1:2,], "list") ## items per user hist(rowCounts(MSWeb), main="Distribution of Vroots visited per user")
data(MSWeb) MSWeb nratings(MSWeb) ## look at first two users as(MSWeb[1:2,], "list") ## items per user hist(rowCounts(MSWeb), main="Distribution of Vroots visited per user")
Provides the generic for normalize/denormalize and a method to normalize/denormalize the ratings in a realRatingMatrix.
normalize(x, ...) ## S4 method for signature 'realRatingMatrix' normalize(x, method="center", row=TRUE) denormalize(x, ...) ## S4 method for signature 'realRatingMatrix' denormalize(x, method=NULL, row=NULL, factors=NULL)
normalize(x, ...) ## S4 method for signature 'realRatingMatrix' normalize(x, method="center", row=TRUE) denormalize(x, ...) ## S4 method for signature 'realRatingMatrix' denormalize(x, method=NULL, row=NULL, factors=NULL)
x |
a realRatingMatrix. |
method |
normalization method. Currently "center" or "Z-score". |
row |
logical; normalize rows (or the columns)? |
factors |
a list with the factors to be used for denormalizing
(elements are "mean" and "sds"). Usually these are not specified and the
values stored in |
... |
further arguments (currently unused). |
Normalization tries to reduce the individual rating bias by row centering the data, i.e., by subtracting from each available rating the mean of the ratings of that user (row). Z-score in addition divides by the standard deviation of the row/column. Normalization can also be done on columns.
Denormalization reverses normalization. It uses the normalization information stored in x unless the user specifies method, row and factors.
A normalized realRatingMatrix.
## create a matrix with ratings m <- matrix(sample(c(NA,0:5),50, replace=TRUE, prob=c(.5,rep(.5/6,6))), nrow=5, ncol=10, dimnames = list(users=paste('u', 1:5, sep=''), items=paste('i', 1:10, sep=''))) ## do normalization r <- as(m, "realRatingMatrix") r_n1 <- normalize(r) r_n2 <- normalize(r, method="Z-score") r r_n1 r_n2 ## show normalized data image(r, main="Raw Data") image(r_n1, main="Centered") image(r_n2, main="Z-Score Normalization")
## create a matrix with ratings m <- matrix(sample(c(NA,0:5),50, replace=TRUE, prob=c(.5,rep(.5/6,6))), nrow=5, ncol=10, dimnames = list(users=paste('u', 1:5, sep=''), items=paste('i', 1:10, sep=''))) ## do normalization r <- as(m, "realRatingMatrix") r_n1 <- normalize(r) r_n2 <- normalize(r, method="Z-score") r r_n1 r_n2 ## show normalized data image(r, main="Raw Data") image(r_n1, main="Centered") image(r_n2, main="Z-Score Normalization")
Creates precision-recall or ROC plots for recommender evaluation results.
## S4 method for signature 'evaluationResults' plot(x, y, avg = TRUE, add=FALSE, type= "b", annotate = FALSE, ...) ## S4 method for signature 'evaluationResultList' plot(x, y, xlim=NULL, ylim=NULL, col = NULL, pch = NULL, lty = 1, avg = TRUE, type = "b", annotate= 0, legend="bottomright", ...)
## S4 method for signature 'evaluationResults' plot(x, y, avg = TRUE, add=FALSE, type= "b", annotate = FALSE, ...) ## S4 method for signature 'evaluationResultList' plot(x, y, xlim=NULL, ylim=NULL, col = NULL, pch = NULL, lty = 1, avg = TRUE, type = "b", annotate= 0, legend="bottomright", ...)
x |
the object to be plotted. |
y |
a character string indicating the type of plot (e.g., "ROC" or "prec/rec"). |
avg |
plot average of runs? |
add |
add to a plot? |
type |
line type (see |
annotate |
annotate N (recommendation list size) to plot. |
xlim , ylim
|
plot limits (see |
col |
colors (see |
pch |
point symbol to use (see |
lty |
line type (see |
legend |
where to place legend (see |
... |
further arguments passed on to |
evaluationResults
,
evaluationResultList
.
See
evaluate
for examples.
Creates recommendations using a recommender model and data about new users.
## S4 method for signature 'Recommender' predict(object, newdata, n = 10, data=NULL, type="topNList", ...)
## S4 method for signature 'Recommender' predict(object, newdata, n = 10, data=NULL, type="topNList", ...)
object |
a recommender model (class |
newdata |
data for active users (class |
n |
number of recommendations in the top-N list. |
data |
training data needed by some recommender algorithms if
|
type |
type of recommendation. The default type is
|
... |
further arguments. |
Returns an object of class "topNList"
or of other appropriate
classes.
data("MovieLense") MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,] train <- MovieLense100[1:50] rec <- Recommender(train, method = "POPULAR") rec ## create top-N recommendations for new users pre <- predict(rec, MovieLense100[101:102], n = 10) pre as(pre, "list") ## predict ratings for new users pre <- predict(rec, MovieLense100[101:102], type="ratings") pre as(pre, "matrix")[,1:10] ## create recommendations using user ids with ids 1..10 in the ## training data pre <- predict(rec, 1:10 , data = train, n = 10) pre as(pre, "list")
data("MovieLense") MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,] train <- MovieLense100[1:50] rec <- Recommender(train, method = "POPULAR") rec ## create top-N recommendations for new users pre <- predict(rec, MovieLense100[101:102], n = 10) pre as(pre, "list") ## predict ratings for new users pre <- predict(rec, MovieLense100[101:102], type="ratings") pre as(pre, "matrix")[,1:10] ## create recommendations using user ids with ids 1..10 in the ## training data pre <- predict(rec, 1:10 , data = train, n = 10) pre as(pre, "list")
Defines a common class for rating data.
A virtual Class: No objects may be created from it.
signature(x = "ratingMatrix", i = "ANY", j = "ANY", drop = "ANY")
: subset the rating matrix (drop
is ignorred).
signature(from = "ratingMatrix", to = "list")
signature(from = "ratingMatrix", to = "data.frame")
: a data.frame with three columns. Col 1 contains user ids, col 2 contains item ids and col 3 contains ratings.
signature(x = "ratingMatrix")
: number of ratings per column.
signature(x = "ratingMatrix")
: number of ratings per row.
signature(x = "ratingMatrix")
: column-wise rating means.
signature(x = "ratingMatrix")
: row-wise rating means.
signature(x = "ratingMatrix")
: dimensions of the rating matrix.
signature(x = "ratingMatrix", value = "list")
: replace dimnames.
signature(x = "ratingMatrix")
: retrieve dimnames.
signature(x = "ratingMatrix")
: returns a list with normalization information for the matrix (NULL if data is not normalized).
signature(x = "ratingMatrix")
: returns all
ratings in x
as a numeric vector.
signature(x = "ratingMatrix")
: returns the ratings as a sparse matrix. The format is different for binary and real rating matices.
signature(x = "ratingMatrix")
: returns a sparse logical matrix with TRUE for user-item combinations which have a rating.
signature(x = "ratingMatrix")
: plot the matrix.
signature(x = "ratingMatrix")
: number of ratings in the matrix.
signature(x = "ratingMatrix")
: sample from users (rows).
signature(object = "ratingMatrix")
See implementing classes
realRatingMatrix
and
binaryRatingMatrix
.
See getList
,
getData.frame
,
similarity
,
dissimilarity
and
dissimilarity
.
A matrix containing ratings (typically 1-5 stars, etc.).
Objects can be created by calls of the form new("realRatingMatrix", data = m)
, where m
is sparse matrix of class
dgCMatrix
in package Matrix
or by coercion from a regular matrix, a data.frame containing user/item/rating triplets as rows, or
a sparse matrix in triplet form (dgTMatrix
in package Matrix).
data
:Object of class
"dgCMatrix"
, a sparse matrix
defined in package Matrix. Note that this matrix drops NAs instead
of zeroes. Operations on "dgCMatrix"
potentially will delete
zeroes.
normalize
:NULL
or a list with normalizaton factors.
Class "ratingMatrix"
, directly.
signature(from = "matrix", to = "realRatingMatrix")
: Note that
unknown ratings have to be encoded in the matrix as NA and not as 0 (which would mean an actual rating of 0).
signature(from = "realRatingMatrix", to = "matrix")
signature(from = "data.frame", to = "realRatingMatrix")
:
coercion from a data.frame with three columns.
Col 1 contains user ids, col 2 contains item ids and
col 3 contains ratings.
signature(from = "realRatingMatrix", to = "data.frame")
: produces user/item/rating triplets.
signature(from = "realRatingMatrix", to = "dgTMatrix")
signature(from = "dgTMatrix", to = "realRatingMatrix")
signature(from = "realRatingMatrix", to = "dgCMatrix")
signature(from = "dgCMatrix", to = "realRatingMatrix")
signature(from = "realRatingMatrix", to = "ngCMatrix")
signature(x = "realRatingMatrix")
: create a
"binaryRatingMatrix"
by setting all ratings larger or equal to
the argument minRating
as 1 and all others to 0.
signature(x = "realRatingMatrix")
: create
top-N lists from the ratings in x. Arguments are
n
(defaults to 10),
randomize
(default is NULL
) and
minRating
(default is NA
).
Items with a rating below minRating
will not be part of the
top-N list. randomize
can be used to get diversity in the
predictions by randomly selecting items with a bias to higher rated
items. The bias is introduced by choosing the items with a probability
proportional to the rating .
The larger the value
the more likely it is to get very highly rated items and a negative
value for
randomize
will select low-rated items.
signature(x = "realRatingMatrix")
: removes
all ratings in x
for which ratings are available in
the realRatingMatrix (of same dimensions as x
)
passed as the argument known
.
signature(x = "realRatingMatrix")
: calculate
the standard deviation of ratings for rows (users).
signature(x = "realRatingMatrix")
: calculate
the standard deviation of ratings for columns (items).
See ratingMatrix
inherited methods,
binaryRatingMatrix
,
topNList
,
getList
and getData.frame
.
Also see dgCMatrix
,
dgTMatrix
and
ngCMatrix
ngCMatrix
in Matrix.
## create a matrix with ratings m <- matrix(sample(c(NA,0:5),100, replace=TRUE, prob=c(.7,rep(.3/6,6))), nrow=10, ncol=10, dimnames = list( user=paste('u', 1:10, sep=''), item=paste('i', 1:10, sep='') )) m ## coerce into a realRatingMAtrix r <- as(m, "realRatingMatrix") r ## get some information dimnames(r) rowCounts(r) ## number of ratings per user colCounts(r) ## number of ratings per item colMeans(r) ## average item rating nratings(r) ## total number of ratings hasRating(r) ## user-item combinations with ratings ## histogram of ratings hist(getRatings(r), breaks="FD") ## inspect a subset image(r[1:5,1:5]) ## coerce it back to see if it worked as(r, "matrix") ## coerce to data.frame (user/item/rating triplets) as(r, "data.frame") ## binarize into a binaryRatingMatrix with all 4+ rating a 1 b <- binarize(r, minRating=4) b as(b, "matrix")
## create a matrix with ratings m <- matrix(sample(c(NA,0:5),100, replace=TRUE, prob=c(.7,rep(.3/6,6))), nrow=10, ncol=10, dimnames = list( user=paste('u', 1:10, sep=''), item=paste('i', 1:10, sep='') )) m ## coerce into a realRatingMAtrix r <- as(m, "realRatingMatrix") r ## get some information dimnames(r) rowCounts(r) ## number of ratings per user colCounts(r) ## number of ratings per item colMeans(r) ## average item rating nratings(r) ## total number of ratings hasRating(r) ## user-item combinations with ratings ## histogram of ratings hist(getRatings(r), breaks="FD") ## inspect a subset image(r[1:5,1:5]) ## coerce it back to see if it worked as(r, "matrix") ## coerce to data.frame (user/item/rating triplets) as(r, "data.frame") ## binarize into a binaryRatingMatrix with all 4+ rating a 1 b <- binarize(r, minRating=4) b as(b, "matrix")
Learns a recommender model from given data.
Recommender(data, ...) ## S4 method for signature 'ratingMatrix' Recommender(data, method, parameter=NULL)
Recommender(data, ...) ## S4 method for signature 'ratingMatrix' Recommender(data, method, parameter=NULL)
data |
training data. |
method |
a character string defining the recommender method to use (see details). |
parameter |
parameters for the recommender algorithm. |
... |
further arguments. |
Recommender uses the registry mechanism from package registry
to manage methods. This let's the user easily specify and add new methods.
The registry is called recommenderRegistry
. See examples section.
An object of class 'Recommender'.
Recommender
,
ratingMatrix
,
predict
.
data("MSWeb") MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 100) rec <- Recommender(MSWeb10, method = "POPULAR") rec getModel(rec) ## save and read a recommender model saveRDS(rec, file = "rec.rds") rec2 <- readRDS("rec.rds") rec2 unlink("rec.rds") ## look at registry and a few methods recommenderRegistry$get_entry_names() recommenderRegistry$get_entry("POPULAR", dataType = "binaryRatingMatrix") recommenderRegistry$get_entry("SVD", dataType = "realRatingMatrix")
data("MSWeb") MSWeb10 <- sample(MSWeb[rowCounts(MSWeb) >10,], 100) rec <- Recommender(MSWeb10, method = "POPULAR") rec getModel(rec) ## save and read a recommender model saveRDS(rec, file = "rec.rds") rec2 <- readRDS("rec.rds") rec2 unlink("rec.rds") ## look at registry and a few methods recommenderRegistry$get_entry_names() recommenderRegistry$get_entry("POPULAR", dataType = "binaryRatingMatrix") recommenderRegistry$get_entry("SVD", dataType = "realRatingMatrix")
Represents a recommender model learned for a given data set (a rating matrix).
Objects are created by
the creator function Recommender(data, method, parameter = NULL)
method
:Object of class "character"
;
used recommendation method.
dataType
:Object of class "character"
;
concrete class of the input data.
ntrain
:Object of class "integer"
;
size of training set.
model
:Object of class "list"
; the model.
predict
:Object of class "function"
; code to compute
a recommendation using the model.
signature(x = "Recommender")
: retrieve the model.
signature(object = "Recommender")
: create
recommendations for new data (argument newdata
).
signature(object = "Recommender")
See Recommender
for the constructor function and
a description of availble methods.
Coerce from and to a
sparse matrix representation where NA
s are not explicitly stored.
dropNA(x) dropNA2matrix(x) dropNAis.na(x)
dropNA(x) dropNA2matrix(x) dropNAis.na(x)
x |
a matrix for |
The representation is based on
the sparse dgCMatrix
in Matrix but instead of zeros, NA
s are dropped.
This is achieved by the following:
Zeros are represented with a very small value (.Machine$double.xmin
)
so they do not get dropped in the sparse representation.
NAs are converted to 0 before cercions to dgCMatrix
to make them not explicitly stored.
Caution: Be careful when working with the sparse matrix and sparse matrix operations (multiplication, addition, etc.) directly.
Sparse matrix operations will see 0 where NAs should be.
Actual zero ratings have a small, but non-zero value (.Machine$double.xmin
).
Sparse matrix operations that can result in a true 0
need to be followed by replacing the 0 with .Machine$double.xmin
or other operations
(like subsetting) may drop the 0.
dropNAis.na()
correctly finds NA values in a sparse matrix with dropped NA values, while
is.na()
does not work.
dropNA2matrix()
converts the sparse representation into a dense matrix. NAs represented by
dropped values are converted to true NAs. Zeros are recovered by using zapsmall()
which replaces
small values by 0.
Returns a dgCMatrix or a matrix, respectively.
dgCMatrix
in Matrix.
m <- matrix(sample(c(NA,0:5),50, replace=TRUE, prob=c(.5,rep(.5/6,6))), nrow=5, ncol=10, dimnames = list(users=paste('u', 1:5, sep=''), items=paste('i', 1:10, sep=''))) m ## drop all NAs in the representation. Zeros are represented by very small values. sparse <- dropNA(m) sparse ## convert back to matrix dropNA2matrix(sparse) ## Note: be careful with the sparse representation! ## Do not use is.na, but use dropNAis.na(sparse)
m <- matrix(sample(c(NA,0:5),50, replace=TRUE, prob=c(.5,rep(.5/6,6))), nrow=5, ncol=10, dimnames = list(users=paste('u', 1:5, sep=''), items=paste('i', 1:10, sep=''))) m ## drop all NAs in the representation. Zeros are represented by very small values. sparse <- dropNA(m) sparse ## convert back to matrix dropNA2matrix(sparse) ## Note: be careful with the sparse representation! ## Do not use is.na, but use dropNAis.na(sparse)
Recommendations a Top-N list.
Objects can be created by
predict
with a recommender model and new data. Alternatively,
objects can be created from a realRatingMatrix using
getTopNLists
.
ratings
:Object of class "list"
.
Each element in the list represents a top-N recommendation
(an integer vector) with item IDs (column numbers in the rating
matrix). The items are ordered in each vector.
items
:Object of class "list"
or NULL
.
If available, a list of the same structure as items
with the
ratings.
itemLabels
:Object of class "character"
n
:Object of class "integer"
specifying the
number of items in each recommendation.
Note that the actual number
on recommended items can be less depending on the data and the
used algorithm.
signature(from = "topNList", to = "list")
: returns a
list with the items (labels) in the topNList.
signature(from = "topNList", to = "realRatingMatrix")
: creates a rating Matrix with entries for the items in the topN list.
signature(from = "topNList", to = "dgTMatrix")
signature(from = "topNList", to = "dgCMatrix")
signature(from = "topNList", to = "ngCMatrix")
signature(from = "topNList", to = "matrix")
: returns
a dense matrix with the ratings for the top-N items. All other items have a rating of NA.
signature(x = "topNList")
: combine several topN lists into a single list. The lists need to be for the same data (i.e., items).
signature(x = "topNList")
: returns only the best
n recommendations (second argument is n
which defaults to 10).
The additional argument minRating
can be used to remove all
entries with a rating below this value.
signature(x = "topNList")
: for how many users
does this object contain a top-N list?
signature(x = "topNList")
:
remove items from the top-N list which are known (have a rating)
for the user given as a ratingMatrix passed on as argument
known
.
signature(x = "topNList")
: in how many top-N
does each item occur?
signature(x = "topNList")
: number of recommendations per user.
signature(object = "topNList")
evaluate
,
getList
,
realRatingMatrix