This project is based of off the bioresponse dataset from kaggle. The first column contains experimental data describing an actual biological response; the molecule was seen to elicit this response (1), or not (0). The remaining columns represent molecular descriptors (d1 through d1776), these are calculated properties that can capture some of the characteristics of the molecule - for example size, shape, or elemental constitution.
The goal is to model the data (descriptor) such that it accuratly predicts the response. It is a classification problem.