System Implementation

Motion Planning

The motion planning is currently executed in two parts. The first part is when the single arm controller takes in the grasp points from the object pose estimation and execute the trajectory for each arm separately. This means that the arms don’t need to be working together at this point, hence they might reach the grasp points at different times. The second part is when the arms have grasped the object and now needs to work together essentially as one system to maneuver it, following the manipulation policy.

Environment Setup

Successfully configured a dual-arm system in MoveIt2 using the Kinova Kortex platform
Extended the official ROS2 single-arm package to support two arms in the same environment
Added environmental objects (Vention table and sample bin) for collision avoidance
Established collision awareness between all components in the workspace

Motion Planning Capabilities

Implemented RRT* planning algorithm for trajectory generation
Configured ROS2 controllers according to Kinova software requirements
Achieved simultaneous planning and execution for both arms
Successfully tested planning and execution in fake hardware mode, Gazebo simulator, and real hardware

Finite State Machine for Manipulation Policy

Developed a dedicated ROS2 package that automates the planning and execution process
Replaced manual goal-setting in Rviz with programmatic goal specification
Established a pipeline for feeding in the position of the bin to spawn in Rviz for motion planning
Order of FSM:
- Subscribe to aruco pose topic that gives x and y pose of bin
- Spawn it in Rviz
- plan and move to grasp points calculated from centroid of bin
- close grippers
- Attach the arms to the bin as one kinematic chain and lift the bin
- Rotate the bin backward to show the bottom face
- Rotate the bun forward to show the top face
- Place back to its original bin position
- Go to home

Top-left: Gazebo Sim, Bottom-left: Rviz/Moveit, Right:FSM

3D Reconstruction

Initial Approach with SAM1

Used bounding box initialization with Segment Anything Model (SAM1) for object segmentation
Mask propagation struggled with obstructions, resulting in noisy point clouds

Improved Approach with SAM2

Implemented point-based initialization for more flexible segmentation
Enhanced mask propagation to track objects across frames
Added negative selection tool to exclude obstructions and robotic arms
Results: Reduced noise in 3D reconstructions and minimized manual intervention

Denoising Techniques Development

Tested multiple denoising methods after discovering SAM2 still produced dense noise
Region Growing: Successfully reduced noise by setting density thresholds
Normal-based Filtering: Preserved important surface features while removing noise
Combined Approach: Applied region growing with normals followed by median filtering
Used bounding box segmentation to address patchiness from distinct segments

Motion Strategies for Improved Scanning

Evaluating deterministic motion strategies for structured scanning
Testing 360° rotations along different axes to optimize object exposure

Testing with Complex Geometries

Successfully adapted the pipeline for objects with varying geometry (airplane model)
Adjusted filtering parameters to accommodate different point cloud densities
Identified limitations in median filtering for complex objects

Automation Progress

Developed an end-to-end automated pipeline from data collection to visualization
Improved segmentation through both point-based and bounding box approaches depending on object characteristics

Object Pose Estimation

We evaluated three major pose estimation models and explored alternative approaches for pose tracking. At this point in time, object pose estimation will be put on the back burner until the Fall semester when we do integration. For now to feed the bin pose into the motion planning pipeline, we used an aruco marker inside the bin to estimate the location. Next semester, we aim to do pose estimation without markers.

Main Models Compared

1. Gen 6D

Strengths: Requires only point cloud mesh for inference; performs well with one-shot inferences
Limitations: Lacks camera intrinsics integration, limiting tracking capability across frames

2. Nvidia Foundation Pose

Strengths: Incorporates tracking functionality after initial inference; versatile for object tracking across frames
Limitations: Requires depth maps and object masks; long inference time (~30 seconds per image)

3. SAM6D

Strengths: Integrates SAM2 for object segmentation; offers streamlined pipeline for segmentation and pose estimation
Limitations: Performance degrades with featureless meshes; limited by lack of specific prompts for SAM

Alternative Approaches Explored

1. 4D Pose Estimation

Using two-view setup to estimate pose by capturing different perspectives
Provides additional positional information but requires precise camera alignment and calibration

2. Homographies for Pose Estimation

Involves segmenting 3D point cloud, aligning axes and angles, then projecting onto segmented surface
Less reliant on camera intrinsics but accuracy depends on effective segmentation

3. Face Identification and Alignment

Focusing on identifying visible faces of objects using YOLO or feature matching
Calculates homography to align 3D point cloud with segmented face
Potentially more robust for complex objects but depends on accurate feature detection