Per-skill evaluation — regression health (golden-set pass rate, named scores) and lift vs raw Claude, side by side.