PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation

1North Dakota State University, 2Kwame Nkrumah University of Science and Technology 3SMART Lab

Overall Framework for PaveCap

Abstract

This research introduces the first multimodal approach for pavement condition assessment, providing both quantitative Pavement Condition Index (PCI) predictions and qualitative descriptions.

We introduce PaveCap, a novel framework for automated pavement condition assessment. The framework consists of two main parts: a Single-Shot PCI Estimation Network and a fine-grained Dense Captioning Network. The PCI Estimation Network uses YOLOv8 for object detection, the Segment Anything Model (SAM) for zero-shot segmentation, and a four-layer convolutional neural network to predict PCI. The Dense Captioning Network uses a YOLOv8 backbone, a Transformer encoder-decoder architecture, and a convolutional feed-forward module to generate detailed descriptions of pavement conditions.

To train and evaluate these networks, we developed a pavement dataset with bounding box annotations, textual annotations, and PCI values. The results of our PCI Estimation Network showed a strong positive correlation (0.70) between predicted and actual PCIs, demonstrating its effectiveness in automating condition assessment. Also, the Dense Captioning Network produced accurate pavement condition descriptions, evidenced by high BLEU (0.7445), GLEU (0.5893), and METEOR (0.7252) scores. Additionally, the dense captioning model handled complex scenarios well, even correcting some errors in the ground truth data. The framework developed here can greatly improve infrastructure management and decision18 making in pavement maintenance.


PCI Estimation Network


Dense Captioning Network


Overall Dense Pavement Captioning Network


Dataset Structure

PaveCap Dataset Structure
Structure of our multimodal dataset. The dataset is based on pavement images sourced from the DSPS 24 Competition, which originally included only the Pavement Condition Index (PCI) for each image. To enhance the dataset for our research, we manually annotated each image with bounding boxes to identify specific pavement distresses. Additionally, we provided textual annotations to classify these distresses based on their presence or absence. Notably, we carefully considered pavement severity levels, which play a crucial role in accurate pavement condition assessment.

Generated Captions

Evaluation Metrics for Dense Captioning

We evaluated the performance of our dense captioning model on our test data using several standard metrics, including BLEU (1 to 4 n-grams), GLEU, and METEOR. The following table presents the mean and standard deviation for each metric based on our test results.

Metric Mean Standard Deviation
BLEU-1 0.7445 ±0.195
BLEU-2 0.6766 ±0.212
BLEU-3 0.6130 ±0.238
BLEU-4 0.5690 ±0.261
GLEU 0.5893 ±0.241
METEOR 0.7252 ±0.199