Motion Planning
The motion planning is currently executed in two parts. The first part is when the single arm controller takes in the grasp points from the object pose estimation and execute the trajectory for each arm separately. This means that the arms don’t need to be working together at this point, hence they might reach the grasp points at different times. The second part is when the arms have grasped the object and now needs to work together essentially as one system to maneuver it, following the manipulation policy.
Environment Setup
- Successfully configured a dual-arm system in MoveIt2 using the Kinova Kortex platform
- Extended the official ROS2 single-arm package to support two arms in the same environment
- Added environmental objects (Vention table and sample bin) for collision avoidance
- Established collision awareness between all components in the workspace

Motion Planning Capabilities
- Implemented RRT* planning algorithm for trajectory generation
- Configured ROS2 controllers according to Kinova software requirements
- Achieved simultaneous planning and execution for both arms
- Successfully tested planning and execution in fake hardware mode before connecting to Mujoco simulator or real hardware


Autonomous Operation
- Developed a dedicated ROS2 package that automates the planning and execution process
- Replaced manual goal-setting in Rviz with programmatic goal specification
- Created a node structure that subscribes to pose estimation data
- Established a pipeline where pose data feeds directly into motion execution
3D Reconstruction
Initial Approach with SAM1
- Used bounding box initialization with Segment Anything Model (SAM1) for object segmentation
- Mask propagation struggled with obstructions, resulting in noisy point clouds
Improved Approach with SAM2
- Implemented point-based initialization for more flexible segmentation
- Enhanced mask propagation to track objects across frames
- Added negative selection tool to exclude obstructions and robotic arms
- Results: Reduced noise in 3D reconstructions and minimized manual intervention

Denoising Techniques Development
- Tested multiple denoising methods after discovering SAM2 still produced dense noise
- Region Growing: Successfully reduced noise by setting density thresholds
- Normal-based Filtering: Preserved important surface features while removing noise
- Combined Approach: Applied region growing with normals followed by median filtering
- Used bounding box segmentation to address patchiness from distinct segments


Motion Strategies for Improved Scanning
- Evaluating deterministic motion strategies for structured scanning
- Testing 360° rotations along different axes to optimize object exposure
Testing with Complex Geometries
- Successfully adapted the pipeline for objects with varying geometry (airplane model)
- Adjusted filtering parameters to accommodate different point cloud densities
- Identified limitations in median filtering for complex objects
Automation Progress
- Developed an end-to-end automated pipeline from data collection to visualization
- Improved segmentation through both point-based and bounding box approaches depending on object characteristics
Object Pose Estimation
We evaluated three major pose estimation models and explored alternative approaches for pose tracking. At this point in time, object pose estimation will be put on the back burner until the Fall semester when we do integration.
Main Models Compared
1. Gen 6D
- Strengths: Requires only point cloud mesh for inference; performs well with one-shot inferences
- Limitations: Lacks camera intrinsics integration, limiting tracking capability across frames
2. Nvidia Foundation Pose
- Strengths: Incorporates tracking functionality after initial inference; versatile for object tracking across frames
- Limitations: Requires depth maps and object masks; long inference time (~30 seconds per image)

3. SAM6D
- Strengths: Integrates SAM2 for object segmentation; offers streamlined pipeline for segmentation and pose estimation
- Limitations: Performance degrades with featureless meshes; limited by lack of specific prompts for SAM
Alternative Approaches Explored
1. 4D Pose Estimation
- Using two-view setup to estimate pose by capturing different perspectives
- Provides additional positional information but requires precise camera alignment and calibration

2. Homographies for Pose Estimation
- Involves segmenting 3D point cloud, aligning axes and angles, then projecting onto segmented surface
- Less reliant on camera intrinsics but accuracy depends on effective segmentation

3. Face Identification and Alignment
- Focusing on identifying visible faces of objects using YOLO or feature matching
- Calculates homography to align 3D point cloud with segmented face
- Potentially more robust for complex objects but depends on accurate feature detection