Finding best sample index by following steps:
- Randomly select original data as training samples to ratio. Leave the remaining data as the out-of-bag samples.
- Run 10 epochs:
- Train a logistic regression model via the selected training samples.
- Make predictions based on the training samples. Since we have totally no idea about the labels, only training data is available to measure the performance.
- If the new auc improved than the previous one, record the sample indexes and the auc.
- Remove the misclassified ones in the training samples and take the same amount of data from oob samples. If the oob samples is insufficient, exit the current epoch.
- Train a logistic regression model via the selected training samples.
- Use the final indexes as the training data to next classifier.