System Implementation V2


Software Architecture

    1. Software Architecture Overview

    1.1. System Overview

    The DARPA Triage Drone system is engineered to autonomously perform triage operations in complex environments. It utilizes a distributed, modular architecture designed for flexibility and robust operation. This architecture is realized through three core Docker containers, ensuring consistent runtime environments and network separation while enabling seamless coordination.

    The system is logically divided into two main parts: the Ground Control Station (GCS) and the onboard drone systems.

    1.2. High-Level Components

    1.2.1. Ground Control Station (GCS)

    Serving as the central command and monitoring hub, facilitated through the Fox Glove GUI, providing essential visualization and control interfaces.

    1.2.2. Onboard Drone Systems

    The drone’s intelligence and operational functions are encapsulated within dedicated Docker containers:

    • Detection Container: Manages the drone’s perception (object detection, target GPS estimation) and related gimbal control (target tracking).
    • Robot Container: Responsible for autonomous decision-making and execution, this container employs a custom behavior management system for high-level control.

    Together, these components create a comprehensive solution, enabling the drone to navigate challenging environments and effectively execute autonomous triage missions.

    2. Ground Control Station (GCS) Subsystem

    2.1. Overview & Purpose

    The GCS subsystem provides the essential interface for controlling and monitoring the drone system, built upon a modular ROS 2 architecture deployed via Docker containers.

    2.2. Data Flow Management

    The system efficiently handles multiple asynchronous data streams using ROS 2 communication primitives (topics, services, actions) and specialized middleware:

    • Uplink (GCS to Drone): Operator commands entered via the GCS UI trigger ROS 2 service calls or action goals. These are processed and forwarded via MAVROS to the autonomy system’s command interface (specifically, the /behavior_tree_commands topic consumed by the BehaviorExecutive node inside the Robot container).
    • Downlink (Drone to GCS):
      • Telemetry: Flight controller data streams via MAVLink are translated by MAVROS into ROS 2 topics. These topics are consumed by various processing nodes for analysis and visualization on the GCS interfaces.
      • Video: GStreamer pipelines manage incoming camera feeds for display to the operator.

    3. Onboard System Architecture

    3.1. Docker Containers Overview

    The drone’s onboard intelligence runs within two primary Docker containers: the Robot Container (housing the autonomy and control logic) and the Detection Container (housing the perception and gimbal control logic). This containerization provides isolated, consistent, and reproducible runtime environments.

    3.2. Robot Container: Autonomy Subsystem

    3.2.1. Runtime Environment (Containerization Details)

    The Robot Docker container is meticulously configured to provide the necessary operating system, libraries, and tools:

    • Base Image: NVIDIA L4T (Linux for Tegra) base images (e.g., dustynv/ros:humble-desktop-l4t-r36.4.0), ensuring compatibility with the Jetson architecture.
    • ROS 2 Humble Backbone: Includes core ROS 2 Humble and packages essential for robotics tasks:
      • mavros: For communication with MAVLink-based flight controllers (e.g., Pixhawk).
      • tf2: For managing coordinate transformations.
      • pcl-ros: For Point Cloud Library integration.
      • Vision-related packages (vision-msgs, image-view, stereo-image-proc): For potential image data handling (though primary vision processing is in the Detection container).
    • System Utilities: Installs essential system utilities and configures settings like timezone and locale (standardizing on en_US.UTF-8).
    • Development and Debugging Tools: Includes cmake, compilers (build-essential), python3-pip, text editors (vim), terminal multiplexers (tmux), and debuggers (gdb).
    • Software Dependencies: Installs critical libraries:
      • libopencv-dev: Core library for computer vision tasks.
      • Python Libraries: numpy, scipy, pymavlink, pyyaml, etc.
    • Remote Access Configuration (SSH): Installs and configures an openssh-server for remote access into the running container.

    3.2.2. Software Architecture & Pipeline

    The autonomy software, primarily located within the robot/ros_ws/src/autonomy package, follows a layered architecture:

    • Layer 1: Interface
      • Purpose: Acts as the crucial bridge between the ROS 2 autonomy software and the robot’s hardware (Pixhawk via MAVLink) or external communication links (like the GCS). It abstracts hardware specifics.
      • Implementation: Leverages a pluginlib-based architecture centered around the robot_interface_node.
        • mavros_interface: The primary plugin, handling communication with the MAVLink flight controller via MAVROS. It translates ROS commands to MAVLink and vice-versa.
        • robot_interface: Potentially provides interfaces for other hardware (non-MAVLink sensors) or custom communication protocols.
        • interface_bringup: Contains launch files to start and configure the necessary interface nodes and load the appropriate plugin.
        • ROS API: Exposes services (e.g., robot_command for ARM, TAKEOFF, LAND) and subscribes to motion command topics (e.g., velocity_command, pose_command), delegating execution to the loaded interface plugin. Publishes status like is_armed.
    • Layer 2: Behavior
      • Purpose: Implements the highest level of decision-making, orchestrating the robot’s actions based on GCS commands, situational awareness (e.g., inter-UAV collision risk from /gps_list), and internal state (e.g., armed status from MAVROS).
      • Architecture (Event-Driven Command Manager): Based on the behavior_executive.cpp implementation, this layer does not use a standard Behavior Tree library. Instead, it employs a custom, event-driven command management system orchestrated by the BehaviorExecutive ROS 2 node.
      • Custom Primitives: Utilizes simple, custom-defined classes:
        • bt::Condition: Serve as flags or triggers. Some represent external command requests (set via /behavior_tree_commands), others reflect internal system states (updated via /mavros/state).
        • bt::Action: Encapsulate the execution logic for specific, predefined behaviors (e.g., AutoTakeoff, GoToPosition, GeofenceMapping). Each manages a simple internal state (active, running, success, failure).
      • Execution Engine (BehaviorExecutive node):
        • ROS Interface:
          • Input/Triggers (Subscribers): Listens to /behavior_tree_commands (primary external trigger from GCS/RQT), /mavros/state (for internal conditions), target waypoints, /gps_list (deconfliction), etc.
          • Command Execution (MAVROS Service Clients): The logic within bt::Action objects calls MAVROS services to command the flight controller (e.g., mavros/set_mode, mavros/cmd/arming, mavros/mission/push).
        • Execution Logic (timer_callback): A periodic timer (20Hz) drives the core loop.
          • No Tree Traversal: The loop iterates through a predefined list of bt::Action objects.
          • Activation Check: It checks if an Action has become newly active (likely triggered by its corresponding Condition being set).
          • Direct Command Dispatch: When an Action activates, its logic executes, primarily involving calls to MAVROS services. The Action’s state (SUCCESS/FAILURE) is often set immediately after dispatch.
      • Handling Complex Sequences: Multi-step behaviors (like hexagonal search) are managed via custom logic and additional ROS timers within specific Action implementations, not via standard BT control flow nodes.

    3.3. Detection Container: Perception & Gimbal Control Subsystem

    3.3.1. Overview

    This container houses the drone’s “eyes” and the control for directing them. It comprises two main nodes working together: inted_gimbal for perception and payload_node for gimbal control and target tracking.

    3.3.2. inted_gimbal Node (Perception Core)

    • Purpose & Modules: Integrates several capabilities using internal modules:
      • Video Stream Processing (Streaming module): Acquires input from the Gremsy gimbal-mounted Flir 640R camera via RTSP (using ffmpeg), handles frame retrieval, and resizes for publishing/inference.
      • Object Detection (Detection module / yolo.py): Employs YOLO for real-time object detection and potentially pose estimation. Filters detections to identify persons (class ID “0”).
      • GPS Coordinate Estimation (GPSEstimator module): Estimates the real-world GPS coordinates of detected persons using the object’s bounding box and the drone’s current MAVROS state (GPS, altitude, heading). Manages a list of unique targets.
    • ROS Interface (Outputs):
      • Publishes the processed video (image_raw).
      • Publishes raw detection bounding boxes (/detection_boxvision_msgs/Detection2DArray).
      • Publishes newly estimated target GPS coordinates (detection/gps_target).
      • Publishes a periodic list of all uniquely tracked target GPS coordinates (target_gps_list).
    • Performance: Uses separate threads for streaming and detection/estimation.

    3.3.3. payload_node Node (Gimbal Control & Tracking)

    • Purpose & Architecture: Controls the physical gimbal orientation based on external commands and detection feedback from inted_gimbal. Uses a state machine (MAPPING, MANUAL_CONTROL, LOCK_ON, SPIN).
    • ROS Interface:
      • Inputs: Subscribes to /gimbal_command (state changes), /current_gimbal_angles (feedback), and crucially /detection_box (from inted_gimbal).
      • Output: Publishes target gimbal angles (/gimbal_angles).
    • Workflow & State Logic:
      • State Transitions: Driven by commands received on /gimbal_command.
      • Timer-Driven Actions: Executes predefined movements in MAPPING (point down) and SPIN (sweep) states.
      • Detection-Driven Action (detection_callback in LOCK_ON state):
        • Processes Detection2DArray messages from inted_gimbal.
        • Uses the first detected person’s bounding box center.
        • Calls track.py::compute_gimbal_command (using detection, current angles, camera intrinsics) to calculate new pitch/yaw needed to center the target.
        • Publishes the updated target angles to /gimbal_angles, implementing visual servoing.

    3.3.4. Interaction (inted_gimbal <> payload_node)

    • inted_gimbal performs perception and publishes detection results on /detection_box.
    • payload_node consumes these detections.
    • In the LOCK_ON state, payload_node uses the detection data to calculate and send real-time angle commands to the gimbal, creating a closed-loop visual tracking system.