John Roest

Drone detection with code!

Drone detection with code!

From theory to practice with a real‑time drone detector#

Object detection is one of the most compelling applications of computer vision. Unlike image classification—which tells you what is in an image—object detection also determines where those objects are located. This enables systems that actively respond to their environment: surveillance cameras, traffic monitoring solutions, autonomous systems.

In this article, we use YOLO (You Only Look Once) for object detection in Python. Rather than a high‑level overview, we walk through a complete, working implementation: a drone detector capable of analyzing both live camera feeds and recorded video files.

The goal is not just to make it work, but to understand why this architecture makes sense and how it can be extended for real‑world use.


What is YOLO?#

YOLO is a family of neural networks designed for fast, efficient object detection. The core idea is that an image is processed once by a single neural network that simultaneously predicts object classes and their bounding boxes.

Earlier detection pipelines generated region proposals first, then classified them—two passes. YOLO merges these into a single pass, achieving significantly lower latency. This makes it well‑suited for real‑time applications.

This implementation uses YOLOv8 via the ultralytics library. YOLOv8 is modern, actively maintained, and straightforward to integrate.


Why YOLO for drone detection#

Drones are small, fast-moving targets that typically appear against complex backgrounds. These characteristics demand a detector that is both fast and robust.

YOLO fits this use case because it:

  • operates in real time with low latency
  • detects multiple objects simultaneously
  • scales across different model sizes
  • can be fine‑tuned with custom training data

This implementation uses a pre‑trained model, but the structure allows swapping in a custom drone‑specific model without any other changes.


System architecture#

The application has four main components:

  1. Loading and configuring the YOLO model
  2. Performing object detection on individual frames
  3. Offline analysis of video files
  4. Real‑time detection with visual feedback

The code emphasizes explicit data modeling and type safety to keep the system maintainable as it grows.


Model loading#

The model is loaded from a configurable path. Using yolov8n.pt—the nano variant—keeps inference fast on modest hardware. Changing to a custom‑trained model requires updating one variable.


Frame‑level detection#

At the core of the system is a function that analyzes a single video frame. Each frame is passed to the YOLO model, which returns bounding boxes, class IDs, and confidence scores.

Not all detected objects are relevant. The implementation filters to labels that correspond to drones: drone, uav, helicopter. Detections are represented as instances of a Detection dataclass rather than loose dictionaries, providing structure and making it straightforward to enrich them later with timestamps or frame indices.


Data modeling with dataclasses and TypedDict#

Detections are modeled explicitly. The Detection dataclass represents a detection in memory. TypedDict definitions describe the serialized form used for output and downstream processing.

This pays off when storing detections, exposing them via an API, or feeding them into analytics pipelines. Strong typing catches errors early and documents intent clearly.


Offline video analysis#

The application supports scanning recorded video files frame by frame. Each frame is analyzed sequentially, and detections are enriched with both frame numbers and timestamps derived from the video's frame rate.

This makes it possible to answer questions like: when did a drone first appear, how long was it visible, and how many detections occurred in total? The result is a structured object that can be stored, inspected, or processed further.


Real‑time detection#

Using OpenCV, the application continuously captures frames from a camera and runs them through YOLO. For each detected drone, a bounding box and confidence label are drawn onto the frame. The current frames‑per‑second is also displayed, providing immediate insight into performance.

Detections are also logged to the console—useful for debugging or for deployments without a display.


Extending the system#

This architecture supports further development without structural changes:

  • training a custom YOLO model specifically for drone detection
  • persisting detections to a database
  • triggering alerts on repeated detections
  • processing RTSP or IP camera streams
  • deploying on edge hardware such as NVIDIA Jetson platforms

Detection, visualization, and data handling are cleanly separated, so extensions remain localized.


Complete Python implementation#

import time
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Dict, List, Optional, TypedDict, Union

import cv2
from ultralytics import YOLO

# === CONFIG ===
MODEL_PATH = "yolov8n.pt"  # optionally your own trained model, e.g. "best.pt"
CAMERA_INDEX = 0            # 0 = laptop webcam, or path to a .mp4 file
CONF_THRESHOLD = 0.4        # detection confidence threshold
TARGET_CLASSES = {"drone", "uav", "helicopter"}  # labels treated as drones


@dataclass
class Detection:
    """Struct for a single detection result."""

    frame: int
    timestamp: Optional[float]
    label: str
    confidence: float
    bbox: Dict[str, int]


class BBoxDict(TypedDict):
    x1: int
    y1: int
    x2: int
    y2: int


class DetectionDict(TypedDict):
    frame: int
    timestamp: Optional[float]
    label: str
    confidence: float
    bbox: BBoxDict


class VideoScanResult(TypedDict):
    frames_processed: int
    detections: List[DetectionDict]


model = YOLO(MODEL_PATH)


def _is_target(name: str) -> bool:
    return any(t in name for t in TARGET_CLASSES)


def detect_frame(frame) -> List[Detection]:
    """Run YOLO on a single frame and return drone detections."""
    detections: List[Detection] = []
    results = model(frame, conf=CONF_THRESHOLD, verbose=False)
    boxes = results[0].boxes

    for box in boxes:
        cls_id = int(box.cls[0])
        name = results[0].names[cls_id].lower()
        if not _is_target(name):
            continue

        conf = float(box.conf[0])
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        detections.append(
            Detection(
                frame=-1,
                timestamp=None,
                label=name,
                confidence=conf,
                bbox={"x1": x1, "y1": y1, "x2": x2, "y2": y2},
            )
        )

    return detections


def _detection_to_dict(detection: Detection) -> DetectionDict:
    """Convert dataclass Detection to its TypedDict counterpart."""
    return DetectionDict(
        frame=detection.frame,
        timestamp=detection.timestamp,
        label=detection.label,
        confidence=detection.confidence,
        bbox=BBoxDict(
            x1=detection.bbox["x1"],
            y1=detection.bbox["y1"],
            x2=detection.bbox["x2"],
            y2=detection.bbox["y2"],
        ),
    )


def scan_video(source: Union[str, Path]) -> VideoScanResult:
    """Scan a video file and return all drone detections."""
    cap = cv2.VideoCapture(str(source))
    if not cap.isOpened():
        raise RuntimeError("Unable to open video source")

    fps = cap.get(cv2.CAP_PROP_FPS) or 0.0
    frames_processed = 0
    detections: List[DetectionDict] = []

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        frame_detections = detect_frame(frame)
        for detection in frame_detections:
            detection.frame = frames_processed
            detection.timestamp = frames_processed / fps if fps else None
            detections.append(_detection_to_dict(detection))

        frames_processed += 1

    cap.release()
    return {"frames_processed": frames_processed, "detections": detections}


def run_realtime(camera_index: int = CAMERA_INDEX) -> None:
    """Run real‑time drone detection using a camera feed."""
    cap = cv2.VideoCapture(camera_index)
    if not cap.isOpened():
        raise RuntimeError("Unable to open camera")

    prev_time = 0.0
    drone_count = 0
    print("🔍 Drone detector started... press [ESC] to stop.")

    while True:
        ret, frame = cap.read()
        if not ret:
            print("⚠️ No frame received — restarting camera...")
            cap.release()
            time.sleep(1)
            cap = cv2.VideoCapture(camera_index)
            continue

        frame_detections = detect_frame(frame)
        detected = bool(frame_detections)

        for detection in frame_detections:
            box = detection.bbox
            cv2.rectangle(
                frame,
                (box["x1"], box["y1"]),
                (box["x2"], box["y2"]),
                (0, 255, 0),
                2,
            )
            cv2.putText(
                frame,
                f"{detection.label} {detection.confidence:.2f}",
                (box["x1"], box["y1"] - 10),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.6,
                (0, 255, 0),
                2,
            )

        curr_time = time.time()
        fps = 1 / (curr_time - prev_time + 1e-6)
        prev_time = curr_time
        cv2.putText(
            frame,
            f"FPS: {fps:.1f}",
            (10, 30),
            cv2.FONT_HERSHEY_SIMPLEX,
            1,
            (255, 255, 255),
            2,
        )

        if detected:
            drone_count += 1
            print(f"[{time.strftime('%H:%M:%S')}] 🚨 Drone detected (total: {drone_count})")

        cv2.imshow("Drone Detector", frame)
        if cv2.waitKey(1) & 0xFF == 27:
            break

    cap.release()
    cv2.destroyAllWindows()
    print("🛑 Stopped.")


if __name__ == "__main__":
    run_realtime()

Conclusion#

With relatively little code, YOLO enables a powerful and extensible object detection system. By combining explicit data modeling, type safety, and a clean separation between detection, visualization, and output, this drone detector serves as a solid foundation for real‑world computer vision applications.

YOLO is not just an impressive model—it is a practical tool for engineers who want to put computer vision to use.