Fig. 2

Model frameworks. A Framework 1, Sub-Classifier Aggregation. During training, HE-stained WSIs were cropped into patches of different sizes (\(1024\times 1024\), \(2048\times 2048\)), with background patches (\(\text{variance}<500\)) excluded. Tumor patches were selected using manual annotation masks for Swin-T-based transfer training. During testing, patch-level predictions were made using the sub-classifier, and WSI-level predicted scores were obtained through probabilistic aggregation. B Framework 2, Multi-Instance Learning (MIL). This general MIL structure included (1) WSI cropping, (2) feature extraction using a pre-trained backbone model, (3) producing pseudo-labels with similarity measure (4) model training based on attention-based or graph-based learning