System Implementation

Hardware

Our hardware setup consists of the following –

  • KUKA LBR Med 7
  • Cameras – Realsense D405 / D435
  • Vention Table
  • Custom EE with Drill
  • Bones with Bone Mount
  • 3D-Printed stencils for verification

The overall setup is shown in Fig 1. The KUKA robot was provided to us by our sponsor, Smith+Nephew. We are also grateful to Prof. Oliver Kroemer who generously provided a Vention table, which we later assembled.

Fig 1 – Hardware Setup

We designed a custom end-effector that can support both the Dremel 9100 drill and the D405 camera, which we later 3D-printed. The first iteration of our design is shown in Fig 2. We also built a bone mount that can hold the knee joint in flexion (Fig 3).

Fig 2 – EE Design
Fig 3 – Bone Mount

Drill Subsystem

For the task of autonomous drilling, we have to interface a drill with our manipulator. After an extensive trade study between drills, we decided to select a Dremel 9100, as it met our desired torque, RPM, weight, and form factor. This Dremel comes with a foot pedal for actuation, which made us hypothesize that it would be relatively more straightforward to put a custom electronic actuation setup to replace the foot pedal (See Fig 4).

Fig 4 – Dremel 9100

As seen in Fig 5, we examined the internal circuit of the foot pedal, and we realized it was operating like a potentiometer. In order to convert the spring compression to an electronic voltage, we decided to use a solid-state relay. For the input to the solid state relay, we used an Arduino UNO, which would provide a 5V digital high output when desired and 0V when we wanted to turn the drill off. The output from the solid state relay is AC Mains. The Arduino thus receives a “drill” or “stop” command over serial and converts that to a digital High or digital Low, respectively, on pin 13. 

Fig 5 – Interfacing via the Dremel foot pedal

This serial command has been programmed to be sent by a ROS 2 node with PySerial, when the manipulator reaches the desired goal. Along with the solid state relay, we have added a dimmer switch potentiometer in the circuit, which allows the surgeon to manually control the speed of the drill, in case of emergency. This functionality is not pertinent to the autonomous nature of the drill. 

To ensure the safety of our entire setup, we have added a bright red Normally Closed emergency stop to the mains of the circuit. Upon pressing this button, all power to the drill is immediately ceased.

This system was tested on multiple levels, first by writing to Arduino Serial, then with Python code writing to serial, thirdly with a user manually writing to the topic that the drill node subscribes to, and only after we were confident of safety and functionality, did we reach the fourth and final testing, where the manipulator autonomously actuates the drill. 

Perception

The primary task of the perception module is to perform registration to localize the bone in front of the robot so that we know where to drill. The idea is to collect a point cloud scene of the bone setup using the D405 camera and register the STL file we have of the bone to the scene. We started by using Open3D’s Point-to-Plane ICP to attempt registration but the results weren’t good enough. In Fig 6 below we show a sample point cloud we collected of a 3D-printed bone and our registration results.

Fig 6 – ICP Initial Results

We later realized that it was because we were not properly processing and filtering the point cloud to remove noise. Thus we made a point cloud processing pipeline that uses techniques like RANSAC and clustering to just extract the bone area (Fig 7). The extracted bone point cloud is shown in Fig 8.

Fig 7 – Point cloud processing pipeline

Fig 8 – Extracted bone point cloud

After extracting the bone area, we attempted ICP registration again but it didn’t give good enough results. So we decided to try a deep learning-based method called Feature Metric Registration which achieves much better alignment as shown in Fig 9.

Fig 9 – Feature Metric Registration

But this was just tested on a single random scene and may not be reflective of our actual data distribution. So we have collected a point cloud dataset using the end-effector armed with a D405 and the actual bone setup (Fig 10).

Fig 10 – Point cloud dataset

But as expected the above-mentioned algorithms didn’t work well on this data so we realized we need to take a more structured approach to this problem.

The overall registration pipeline consists of 3 stages as shown in Fig 11.

Fig 11 – Registration Pipeline

After exploring various methods we realized a custom heuristic works best for global registration where we manually align the centers of the two point clouds. We then perform ICP, but in a RANSAC fashion where we perform 100+ trials. In each trial, we first add Gaussian noise to the initial transformation we get from global registration (Fig 12).

Fig 12 – How the initial transformation varies in each trial due to Gaussian noise

Then we perform ICP for each trial and then return the transformation with the best fitness score across all trials. This results in a rather robust registration result that worked for 12/16 samples in our dataset. Results across different point clouds are shown in Fig 13 and the final overlaid result is shown in Fig 14.

Fig 13 – Registration Result
Fig 14 – Registered bone overlaid on top of the overall scene

We finally performed camera calibration and refactored the entire Open3D pipeline to a ROS2 package to integrate working with live realsense feed and the rest of our manipulation stack (Fig 15).

Fig 15 – ROS Registration Pipeline with Live Data

The overall pipeline is perfectly functional and has been integrated with our manipulation pipeline. This is what was used for our Spring Demonstrations and we were able to achieve <8mm error. But we assume certain priors for registration that limits the system’s ability to generalize to various scenarios –

  • Femur is always on the left
  • Femur is within the manual crop region
  • Femur makes less than 45 deg angle with the horizontal axis

Thus we have been developing an alternate registration pipeline that tries to leverage state-of-the-art foundation models. This consists of a combined extraction plus global registration step as shown below in Fig 16.

Fig 16 – New Registration Pipeline

This consists of the following steps each of which are shown visually in Fig 17 –

  • First we use GroundingDINO with the prompt “bone” to find an initial bounding box. This step effectively replaces the manual crop in the original pipeline with a more dynamic variant. We also tried the prompt “femur” but that still gives us the overall bone region instead of just the femur.
  • Then we project the point cloud within to a depth image of the region and extract the point with minimum depth (since in flexion this will almost always lie on the femur).
  • We then use Segment-Anything (SAM) to get a mask of just the femur region. SAM can either take a bounding box or a specific point as input but we can’t directly give it the DINO bounding box since that is of the whole bone. So we instead compute a point that is guaranteed to lie on the femur in the previous step which is given to SAM to obtain a femur mask.
  • By fitting a cv2.minRect using OpenCV we are also able to extract the orientation (the red vector) which can be a good initial registration estimate.
Fig 17 – DINO/SAM Pipeline

Finally we find points that lie within this mask (by using the camera instrinsics) and are thus able to find a target region that just consists of the femur. We then use RANSAC ICP as in the original pipeline to refine and obtain the final alignment. Results are shown in Fig 18.

Fig 18 – Registration Results. Blue region is the unprojected SAM mask. Red is the source.

For future work, we will incorporate the additional D435 top-down camera which will be used to track motion for dynamic compensation. This will also require us to improve the performance of our registration pipeline. Currently it takes a few seconds to register and we eventually want to be able to register multiple times within a second.

Planning / Manipulation

Fig 19 – Manipulation Subsystem Overview

For our manipulation stack, we intend to use MoveIt for motion planning. For this, we are using the ros2_lbr_fri package. We first set up the package in simulation with Gazebo (Fig 20) and synced it with our manipulator enabling us to control the KUKA Arm with MoveIt. The overall architecture is shown in Fig 19.

Fig 20 – Rviz with MoveIt and Gazebo

But the above example was controlled via the GUI and we need to control the arm programmatically so we started working with MoveGroupInterface.

MoveGroupInterface

To plan for the Kuka arm, we need to write a MoveGroupInterface to communicate with the Kuka MoveGroup. Because our MoveGroup is launched under namespace /lbr, we also launched the MoveGroupInterface under the same namespace and remapped the action server/client to make MoveGroupInterface communicate with MoveGroup. With MoveGroupInterface, we can select the planner used for planning and can add static obstacles (motor mount, table, and virtual walls) into the planning stack (Fig 21). We later removed the static obstacles for the final integration for more predictable planning.

Fig 21 – Static Obstacles in Planning Scene

Surgical Plan

To make the drill reach the desired drilling pose, we first need to define the surgical plan. A surgical plan is a drilling point and a drilling vector. To define it on an STL file, we choose three points to form a face, so we can use the normal vector of the plane as our drilling vector, and a point on the face (selected the last point from three points in counter-clockwise order) to be the drilling point. With Solidworks (Fig 22), we can easily output the STL file and port it to Open3D (Fig 23).

Figure 22 – Surgical Plan in SolidWorks
Figure 23 – Surgical Plan in Open3D

Integration with Perception Sub-system

Finally, we take the pose provided by the perception system after registration and make the manipulator move to that pose (Fig 24).

Fig 24 – Moving to Surgical Plan in Simulation

We also execute this on the actual robot. Originally, there was some calibration error so the manipulator didn’t quite reach the desired goal (See Fig 25).

Fig 25 – System with slight calibration error

But after refining the registration pipeline and re-calibrating the extrinsic matrix, we get better accuracy as shown in Fig 26.

Fig 26 – System after fixing calibration error

Full System Integration

We put all the systems together and measure the final drilling accuracy to evaluate our system. Our target criteria is to be within 8mm of the desired goal. In practice we observe our system is always within 8mm and more often than not within 4mm. The final drilling results are shown in Fig 27.

A bone model on a table

Description automatically generated
Fig 27 – Final Results. Holes are within 4mm of desired position.