The operating envelope of data pruning: what holds, and what breaks
Notes from a multi-dataset NLP study on how aggressively training data can be trimmed before accuracy starts to slip. Anyone who's trained a model at production scale has done some version of the math: GPU-hours times dollars per hour times how many runs you'll