Skip to main content

Evaluation

We utilized the SmartDoc 2015 dataset for our testing.

Protocol

We employ the Jaccard Index as our measure, which summarizes how well different methods perform in correctly segmenting page contours and penalizes those that fail to detect document objects in certain frames.

The evaluation process starts by using the size and coordinates of the document in each frame to perform a perspective transform on the quadrilateral coordinates of the submitted method S and the ground truth G, obtaining the corrected quadrilaterals S0 and G0. This transformation ensures that all evaluation metrics are comparable within the document reference system. For each frame f, we calculate the Jaccard Index (JI), an indicator of the degree of overlap of the corrected quadrilaterals, defined as the intersection polygon of the detected quadrilateral and the ground truth quadrilateral divided by their union polygon. The overall score for each method is the average of the scores across all frames in the test dataset.

Results

The following are the evaluation results of our models on the SmartDoc 2015 dataset:

Modelsbg01bg02bg03bg04bg05Overall
FastViT_SA240.99440.99320.99400.99370.99290.9937
MBV2_1400.99170.99010.99210.98990.98910.9909
FastViT_T80.99200.98940.99180.98960.98880.9906
LC1000.99080.98770.99050.98940.98540.9892
LC0500.98470.98220.98650.98110.97220.9826
PReg-LC050-XAtt0.96630.96060.96640.96300.91990.9596

Parameter Settings

The table below details the parameter settings used for each model:

Model NameModelTypeModelCfg
FastViT_SA24heatmapfastvit_sa24
MBV2-140heatmapmobilenetv2_140
FastViT_T8heatmapfastvit_t8
LC100heatmaplcnet100
LC050heatmaplcnet050
PReg-LC050-XAttpointlcnet050

For example, to use the LC050 model, call as follows:

from docaligner import DocAligner

model = DocAligner(model_type='heatmap', model_cfg='lcnet050')

Comparative Overview

The table below compares each model name based on parameters, FP32 size, FLOPs, and overall score:

Model NameParameters (M)FP32 Size (MB)FLOPs(G)Overall Score
FastViT_SA2420.883.18.50.9937
MBV2-1403.714.72.40.9909
FastViT_T83.313.11.70.9906
LC1001.24.91.60.9892
LC0500.41.71.20.9826
PReg-LC050-XAtt1.14.50.220.9596
tip

Choosing a model is a process of trade-offs; when you need a smaller model, LC050 is a great option, though the overall score is lower; alternatively, you can use the default FastViT_SA24, but it will occupy more space.