Let's say we are trying to use xgboost to make prediction about our data and here is a sample data that we're going to be using :- Some terminology before moving on. R uses the term label to say, this is our expected output when we're building our model. Yes, it is really confusing. A label to becomes final output of our predictions. Basically what we're trying to find is relation between smoking and high sugar intake will lead to a person having disease. These are fake data of course. There are people who smokes and eat as much choc as they like, they still look sharp. (not me tho) First we will create these data using R. Code example shows we're loading some libraries and then create a data frame called 'a'. Next it convert 'a' into a data table 'd'. require(xgboost) require(Matrix) require(data.table) if (!require('vcd')) install.packages('vcd') a = data.frame(id=c(1,2,3,4,5,6,7,8,9,10,11,12,13