2. Data Documentation
2.1. Overview
The OSA (Out of Stock Alert) project leverages a multi-source dataset strategy combining real-world retail imagery with specialized annotations for both void detection and product recognition tasks.
2.2. Data Sources and Collection
2.2.1. Real-World Retail Environment Data
Our primary data collection focused on capturing authentic retail scenarios from major supermarket chains in Morocco.
Data Collection Sites
Collection Methodology
Systematic aisle-by-aisle documentation
Multiple visits at different times
Various lighting conditions and shelf states
Comprehensive coverage across product categories
2.3. Void Detection Dataset Development
2.3.1. Annotation with Roboflow
The void detection dataset was manually annotated using Roboflow from images collected at local Meknes Marjane and Atakadao retail locations. We used Roboflow’s collaborative annotation tools and quality control features to create high-quality annotations for void detection.
Void Detection Types
Complete Void: Entire shelf section with no products
Partial Void: Sections with reduced product density
Gap Void: Small gaps between products
Edge Void: Empty spaces at shelf edges
Figure: Examples of different void types and annotations
Annotation Process
Initial labeling of void areas
Quality review by experienced annotators
Final validation for consistency
Integration into training dataset
Custom Void Detection Dataset
Dataset Name: OSA Void Detection Dataset
Description: Custom-built dataset for detecting empty spaces on retail shelves
Format: YOLO format annotations via Roboflow
Size: Real-world retail images with void annotations
Source: Manually annotated data collected from local Meknes Marjane and Atakadao retail locations
Annotation Tool: Roboflow platform
Link: Void Detection Dataset
2.4. Product Detection Dataset
2.4.1. Open Source SKU Dataset from Kaggle
For product detection, we integrated a comprehensive SKU dataset from Kaggle, providing extensive product category coverage with professional-grade annotations.
Dataset Characteristics
High-resolution product images
Extensive product category coverage
YOLO-style annotations with detailed classifications
Multi-level product hierarchy
Integration Process
Format standardization and category mapping
Quality filtering and metadata enrichment
Image normalization and augmentation
Performance benchmarking
Open Source SKU Dataset
Dataset Name: Open Source SKU Dataset
Description: Comprehensive collection of retail product images with bounding box annotations for product detection
Format: Yolov5 format with image files
Size: Multiple product categories with detailed annotations
Source: Kaggle open source community
Link: SKU Detection Dataset
2.5. Data Preprocessing Pipeline
Standard Processing Steps
Image format standardization (JPEG, PNG)
Resolution normalization (640x640 for YOLO models)
Annotation format conversion (COCO → YOLO)
Dataset splitting (train/validation/test: 70/20/10)
Data augmentation and quality checks