In this project, students will apply different families of regression methods to a large data set. They will provide an overview of the different methodologies in the form of a written report with an assessment of their strengths and weaknesses for the respective data set. They will choose a method from those surveyed in class, implement it from scratch and also apply it to the respective data set. They will also have to develop new features and do pre processing of the data. The challenge behind is on the amount of data, which will not fit in memory.


  • Develop a deeper understanding of machine learnign through the application of different methods and the implementation of one of your choice.
  • Work on an applied problem with very large data sets.
  • Obtain valuable real-world skills such as development of relevant features and reduction of the complexity of the data.

Data sets


Target audience

Ideal for students not enrolled in Data Science or Statistical Learning Theory