News

New paper! in the American Naturalist

Thursday, June 30, 2022

Cross Validation and training-validation-test split

 Train-Validation-Test split link

  1. Training dataset (dataset 1 x K) is used to train a few candidate models
  2. Validation dataset (dataset 2 x K) is used to evaluate the candidate models
  3. One of the candidates is chosen
  4. The chosen model is trained with a new training dataset (dataset 3 = all the data used in steps 1 & 2)
  5. The trained model is evaluated with the test dataset (dataset 4: an unseen dataset)
In steps 1 and 2 (called cross validation), evaluate each model K times with different dataset and take the average score for the decision at step 3. These K datasets are ideally different, or we can use k-fold cross validation.