XGBoost is short for “Extreme Gradient Boosting”, where the term “Gradient Boosting” is firstly proposed in the paper Greedy Function Approximation: A Gradient Boosting Machine, by Friedman. XGBoost is based on this original model.
Random forests is a notion of the general technique of random decision forests that are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time.
k-Nearest Neighbors algorithm (or k-NN for short) is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. We will use IBk nearest neighbor method.
In our daily life, it’s very important to know how much it rained on a particular field, especially for agriculture, but rainfall is variable in space and time and it’s hard to use rain gauges everywhere for estimation. Therefore, an alternative way is using the information gathered from radar for estimation, which is cheap and abundant for training to infer actual total rainfall.
The classification algorithms we choose are Random Forests, XGBoost and IBk. We choose Random Forest because our training data has many attributes, so a normal decision tree will be easily overfitting. As for XGBoost, it is a new machine learning method which usually performs well in value estimation. And IBk is the easiest way for implementation in such problem.
Both XGBoost and Random Forest does well on esimation. According to results, XGBoost is slightly better than Random Forest. But IBk has the best result along with the easiest implementation.
Team
Master of Computer Science, Northwestern University. Email me
Master of Computer Science, Northwestern University. Email me
Master of Computer Science, Northwestern University. Email me