🎯 Crowd Counting by ZIP

Upload an image and get precise crowd density predictions with ZIP models!

ZIP (Zero-Inflated Poisson) is a framework designed for crowd counting, a task where the goal is to estimate how many people are present in an image. It was introduced in the paper ZIP: Scalable Crowd Counting via Zero-Inflated Poisson Modeling. ZIP is based on a simple idea: not all empty areas in an image mean the same thing. Some regions are empty because there are truly no people there (like walls or sky), while others are places where people could appear but just happen not to in this particular image. ZIP separates these two cases using two prediction heads:

  • Structural Zeros: These are regions that naturally never contain people (e.g., the background or torso areas). These are handled by the π head.
  • Sampling Zeros: These are regions where people could appear but don't in this image. These are modeled by the λ head.

By separating where people are likely to be from how many are present, ZIP produces more accurate and interpretable crowd estimates, especially in scenes with large empty spaces or varied crowd densities.

Choose from different model variants: ZIP-B (Base), ZIP-S (Small), ZIP-T (Tiny), ZIP-N (Nano), ZIP-P (Pico)

🎛️ Select Model & Dataset

Choose model variant, dataset, and evaluation metric

🔍 Zero Analysis

Explore different types of zero predictions in crowd analysis

Click on any example below to test the model:

📚 Example Gallery

Step-by-step Guide:

  1. 🎛️ Select Model: Choose your preferred model variant, pre-training dataset, and pre-training evaluation metric from the dropdown
  2. 📸 Upload Image: Click the image area to upload your crowd photo or use clipboard
  3. 🚀 Analyze: Click the "Analyze Crowd" button to start processing
  4. 📊 View Results: Examine the density maps and crowd count in the output panels

Understanding the Outputs:

📊 Main Results:

  • 🎯 Density Map: Shows where people are located with color intensity, modeled by (1-π) * λ
  • 🧙 Predicted Count: Total number of people detected in the image

🔍 Zero Analysis:

  • 🏗️ Structural Zero Map: Indicates regions that structurally cannot contain head annotations (e.g., walls, sky, torso, or background). These are governed by the π head, which estimates the probability that a region never contains people.
  • 📊 Sampling Zero Map: Shows areas where people could be present but happen not to appear in the current image. These zeros are modeled by (1-π) * exp(-λ), where the expected count λ is near zero.
  • 👺 Complete Zero Map: A combined visualization of zero probabilities, capturing both structural and sampling zeros. This map reflects overall non-crowd likelihood per region.

🔥 Hotspots:

  • 📈 Lambda Map: Highlights areas with high expected crowd density. Each value represents the expected number of people in that region, modeled by the Poisson intensity (λ). This map focuses on how many people are likely to be present, WITHOUT assuming people could appear there. ⚠️ Lambda Map NEEDS to be combined with Structural Zero Map by (1-π) * λ to produce the final density map.

Model Variants:

  • ZIP-B: Base model with best performance
  • ZIP-S: Small model for faster inference
  • ZIP-T: Tiny model for resource-constrained environments
  • ZIP-N: Nano model for mobile applications
  • ZIP-P: Pico model for edge devices

Pre-trainining Datasets:

  • ShanghaiTech A: Dense, low-resolution crowd scenes
  • ShanghaiTech B: Sparse, high-resolution crowd scenes
  • UCF-QNRF: Dense, ultra high-resolution crowd images
  • NWPU-Crowd: Largest ultra high-resolution crowd counting dataset

Pre-trainining Evaluation Metrics:

  • MAE: Mean Absolute Error - average counting error.
  • NAE: Normalized Absolute Error - relative counting error