For the campus navigation module, a custom dataset was developed to reflect the unique environmental challenges of university campuses. The data collection involved video recordings across multiple campus locations at Hacettepe University.
The ATM dataset consists of:
YOLOv8 was used for real-time object detection tasks. It optimizes the following loss function:
Training Parameters:
Two VLMs were explored:
Camera Input → YOLOv8 → Task Decision → ATM/Navigation/VQA → OCR/Text Detection → Voice Output
Training using Ultralytics YOLOv8 CLI with custom datasets.
Mean Average Precision (mAP) used:
mAP@0.5 = (1/|C|) ∑ (precision per class × recall)
BLEU, METEOR, Human Grading
Character Error Rate (CER):
CER = (S + D + I) / N
Measured per-frame processing time for detection, OCR, VQA, and TTS. Threshold: <500ms/task
The Beyond Vision system was developed through multi-phase methodology emphasizing real-world performance. By combining YOLOv8, VLMs, OCR, and modular design, we created a scalable and adaptive assistive solution for visually impaired users in academic environments.