John Roest

Drone detection with code!

Sat Jan 24 2026

Drone detection with code!

From theory to practice with a real‑time drone detector

Object detection is one of the most compelling applications of computer vision. Unlike simple image classification, where a model only tells you what is present in an image, object detection also determines where those objects are located. This enables systems that can actively respond to their environment, such as surveillance cameras, traffic monitoring solutions, or autonomous systems.

In this article, we explore how to use YOLO (You Only Look Once) for object detection in Python. Rather than staying at a high‑level overview, we will walk through a complete, working example: a drone detector capable of analyzing both live camera feeds and recorded video files.

The goal is not just to make something that works, but to understand why this architecture makes sense and how it can be extended or adapted for real‑world use.


What is YOLO?

YOLO, short for You Only Look Once, is a family of neural networks designed specifically for fast and efficient object detection. The core idea behind YOLO is that an image is processed only once by a single neural network, which simultaneously predicts object classes and their bounding boxes.

This is fundamentally different from earlier detection pipelines that first generated region proposals and then classified them. By merging these steps into a single pass, YOLO achieves significantly lower latency, making it highly suitable for real‑time applications.

In this project, we use YOLOv8, accessed via the ultralytics Python library. YOLOv8 is modern, actively maintained, and easy to integrate, while still being powerful enough to run on consumer hardware.


Why YOLO is well‑suited for drone detection

Detecting drones is a challenging task. Drones are relatively small, often move quickly, and usually appear against complex backgrounds such as skies, buildings, or natural landscapes. These characteristics require a detection model that is both fast and robust.

YOLO is a strong fit for this use case because it:

  • operates in real time with low latency
  • detects multiple objects simultaneously
  • scales well across different model sizes
  • can be fine‑tuned with custom training data

Although this implementation uses a pre‑trained model, the structure allows you to seamlessly swap in a custom drone‑specific model later on.


Overall system architecture

Before diving into the full code, it is useful to understand how the application is structured. Conceptually, the system consists of four main components:

  1. Loading and configuring the YOLO model
  2. Performing object detection on individual frames
  3. Offline analysis of video files
  4. Real‑time detection with visual feedback

On top of that, the code emphasizes clear data modeling and type safety, which helps keep the system maintainable as it grows.


Loading the YOLO model

Everything starts with loading a YOLO model. In this example, we use yolov8n.pt, the nano variant of YOLOv8. This model is lightweight and fast, making it ideal for real‑time experimentation and deployment on modest hardware.

By defining the model path as a configuration variable, you can easily switch to a different model (for example, a custom‑trained one) without changing the rest of the codebase.


Frame‑level object detection

At the heart of the system is the function that analyzes a single video frame and extracts relevant detections. Each frame is passed directly to the YOLO model, which returns bounding boxes, class IDs, and confidence scores.

A key design choice in this implementation is that not all detected objects are equally interesting. We explicitly filter detections to only those that can reasonably be interpreted as drones, such as objects labeled drone, uav, or helicopter.

These detections are represented as instances of a Detection dataclass rather than loose dictionaries. This provides structure, clarity, and makes it easier to enrich detections later with additional metadata such as timestamps or frame indices.


Data modeling with dataclasses and TypedDict

Instead of passing around unstructured data, this implementation models detections explicitly. The Detection dataclass represents a single detection in memory, while TypedDict definitions describe the serialized form used for output and further processing.

This approach pays off when storing detections, exposing them via an API, or feeding them into downstream analytics pipelines. Strong typing helps catch errors early and documents intent clearly.


Offline video analysis

In addition to real‑time detection, the application supports scanning video files frame by frame. Each frame is analyzed sequentially, and detections are enriched with both frame numbers and timestamps derived from the video’s FPS.

This makes it possible to answer questions such as:

  • When did a drone first appear in the video?
  • How long was it visible?
  • How many total detections occurred?

The result is returned as a structured object that can easily be stored, inspected, or processed further.


Real‑time detection with visual feedback

The most tangible part of the system is the real‑time detection mode. Using OpenCV, the application continuously captures frames from a camera and analyzes them with YOLO.

For each detected drone, a bounding box and label with confidence score are drawn onto the frame. The current frames‑per‑second (FPS) is also displayed, providing immediate insight into performance.

Whenever a drone is detected, the system logs an event to the console. This dual feedback—visual and textual—is particularly useful for debugging or for deployments where no GUI is available.


Extending the system

This architecture forms a solid foundation for further development. Possible extensions include:

  • training a custom YOLO model specifically for drone detection
  • persisting detections in a database
  • triggering alerts when repeated detections occur
  • processing RTSP or IP camera streams
  • deploying the detector on edge devices such as NVIDIA Jetson platforms

Because detection, visualization, and data handling are cleanly separated, such extensions remain manageable.


Complete Python implementation

Below is the complete, unmodified Python code for the drone detector discussed in this article.

import time
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Dict, List, Optional, TypedDict, Union

import cv2
from ultralytics import YOLO

# === CONFIG ===
MODEL_PATH = "yolov8n.pt"  # optionally your own trained model, e.g. "best.pt"
CAMERA_INDEX = 0            # 0 = laptop webcam, or path to a .mp4 file
CONF_THRESHOLD = 0.4        # detection confidence threshold
TARGET_CLASSES = {"drone", "uav", "helicopter"}  # labels treated as drones


@dataclass
class Detection:
    """Struct for a single detection result."""

    frame: int
    timestamp: Optional[float]
    label: str
    confidence: float
    bbox: Dict[str, int]


class BBoxDict(TypedDict):
    x1: int
    y1: int
    x2: int
    y2: int


class DetectionDict(TypedDict):
    frame: int
    timestamp: Optional[float]
    label: str
    confidence: float
    bbox: BBoxDict


class VideoScanResult(TypedDict):
    frames_processed: int
    detections: List[DetectionDict]


model = YOLO(MODEL_PATH)


def _is_target(name: str) -> bool:
    return any(t in name for t in TARGET_CLASSES)


def detect_frame(frame) -> List[Detection]:
    """Run YOLO on a single frame and return drone detections."""
    detections: List[Detection] = []
    results = model(frame, conf=CONF_THRESHOLD, verbose=False)
    boxes = results[0].boxes

    for box in boxes:
        cls_id = int(box.cls[0])
        name = results[0].names[cls_id].lower()
        if not _is_target(name):
            continue

        conf = float(box.conf[0])
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        detections.append(
            Detection(
                frame=-1,
                timestamp=None,
                label=name,
                confidence=conf,
                bbox={"x1": x1, "y1": y1, "x2": x2, "y2": y2},
            )
        )

    return detections


def _detection_to_dict(detection: Detection) -> DetectionDict:
    """Convert dataclass Detection to its TypedDict counterpart."""
    return DetectionDict(
        frame=detection.frame,
        timestamp=detection.timestamp,
        label=detection.label,
        confidence=detection.confidence,
        bbox=BBoxDict(
            x1=detection.bbox["x1"],
            y1=detection.bbox["y1"],
            x2=detection.bbox["x2"],
            y2=detection.bbox["y2"],
        ),
    )


def scan_video(source: Union[str, Path]) -> VideoScanResult:
    """Scan a video file and return all drone detections."""
    cap = cv2.VideoCapture(str(source))
    if not cap.isOpened():
        raise RuntimeError("Unable to open video source")

    fps = cap.get(cv2.CAP_PROP_FPS) or 0.0
    frames_processed = 0
    detections: List[DetectionDict] = []

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        frame_detections = detect_frame(frame)
        for detection in frame_detections:
            detection.frame = frames_processed
            detection.timestamp = frames_processed / fps if fps else None
            detections.append(_detection_to_dict(detection))

        frames_processed += 1

    cap.release()
    return {"frames_processed": frames_processed, "detections": detections}


def run_realtime(camera_index: int = CAMERA_INDEX) -> None:
    """Run real‑time drone detection using a camera feed."""
    cap = cv2.VideoCapture(camera_index)
    if not cap.isOpened():
        raise RuntimeError("Unable to open camera")

    prev_time = 0.0
    drone_count = 0
    print("🔍 Drone detector started... press [ESC] to stop.")

    while True:
        ret, frame = cap.read()
        if not ret:
            print("⚠️ No frame received — restarting camera...")
            cap.release()
            time.sleep(1)
            cap = cv2.VideoCapture(camera_index)
            continue

        frame_detections = detect_frame(frame)
        detected = bool(frame_detections)

        for detection in frame_detections:
            box = detection.bbox
            cv2.rectangle(
                frame,
                (box["x1"], box["y1"]),
                (box["x2"], box["y2"]),
                (0, 255, 0),
                2,
            )
            cv2.putText(
                frame,
                f"{detection.label} {detection.confidence:.2f}",
                (box["x1"], box["y1"] - 10),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.6,
                (0, 255, 0),
                2,
            )

        curr_time = time.time()
        fps = 1 / (curr_time - prev_time + 1e-6)
        prev_time = curr_time
        cv2.putText(
            frame,
            f"FPS: {fps:.1f}",
            (10, 30),
            cv2.FONT_HERSHEY_SIMPLEX,
            1,
            (255, 255, 255),
            2,
        )

        if detected:
            drone_count += 1
            print(f"[{time.strftime('%H:%M:%S')}] 🚨 Drone detected (total: {drone_count})")

        cv2.imshow("Drone Detector", frame)
        if cv2.waitKey(1) & 0xFF == 27:
            break

    cap.release()
    cv2.destroyAllWindows()
    print("🛑 Stopped.")


if __name__ == "__main__":
    run_realtime()

Final thoughts

With relatively little code, YOLO enables you to build a powerful and extensible object detection system. By combining clear abstractions, strong typing, and a clean separation of concerns, this drone detector goes beyond a simple demo and serves as a solid foundation for real‑world computer vision applications.

YOLO is not just an impressive model, it is a practical tool for engineers who want to put computer vision to work.