2. Data Documentation

2.1. Overview

The OSA (Out of Stock Alert) project leverages a multi-source dataset strategy combining real-world retail imagery with specialized annotations for both void detection and product recognition tasks.

Data collection and processing pipeline overview

2.2. Data Sources and Collection

2.2.1. Real-World Retail Environment Data

Our primary data collection focused on capturing authentic retail scenarios from major supermarket chains in Morocco.

Data Collection Sites

Data collection at retail locations

Collection Methodology

  • Systematic aisle-by-aisle documentation

  • Multiple visits at different times

  • Various lighting conditions and shelf states

  • Comprehensive coverage across product categories

2.3. Void Detection Dataset Development

2.3.1. Annotation with Roboflow

The void detection dataset was manually annotated using Roboflow from images collected at local Meknes Marjane and Atakadao retail locations. We used Roboflow’s collaborative annotation tools and quality control features to create high-quality annotations for void detection.

Void Detection Types

  • Complete Void: Entire shelf section with no products

  • Partial Void: Sections with reduced product density

  • Gap Void: Small gaps between products

  • Edge Void: Empty spaces at shelf edges

Examples of void detection annotations

Figure: Examples of different void types and annotations

Annotation Process

  1. Initial labeling of void areas

  2. Quality review by experienced annotators

  3. Final validation for consistency

  4. Integration into training dataset

Custom Void Detection Dataset

  • Dataset Name: OSA Void Detection Dataset

  • Description: Custom-built dataset for detecting empty spaces on retail shelves

  • Format: YOLO format annotations via Roboflow

  • Size: Real-world retail images with void annotations

  • Source: Manually annotated data collected from local Meknes Marjane and Atakadao retail locations

  • Annotation Tool: Roboflow platform

  • Link: Void Detection Dataset

2.4. Product Detection Dataset

2.4.1. Open Source SKU Dataset from Kaggle

For product detection, we integrated a comprehensive SKU dataset from Kaggle, providing extensive product category coverage with professional-grade annotations.

Dataset Characteristics

  • High-resolution product images

  • Extensive product category coverage

  • YOLO-style annotations with detailed classifications

  • Multi-level product hierarchy

Kaggle SKU dataset overview

Integration Process

  • Format standardization and category mapping

  • Quality filtering and metadata enrichment

  • Image normalization and augmentation

  • Performance benchmarking

Open Source SKU Dataset

  • Dataset Name: Open Source SKU Dataset

  • Description: Comprehensive collection of retail product images with bounding box annotations for product detection

  • Format: Yolov5 format with image files

  • Size: Multiple product categories with detailed annotations

  • Source: Kaggle open source community

  • Link: SKU Detection Dataset

2.5. Data Preprocessing Pipeline

Standard Processing Steps

  1. Image format standardization (JPEG, PNG)

  2. Resolution normalization (640x640 for YOLO models)

  3. Annotation format conversion (COCO → YOLO)

  4. Dataset splitting (train/validation/test: 70/20/10)

  5. Data augmentation and quality checks