FVE Performance Evaluation
– A spam can, a bottle, and a soup can each with an AprilTag
– The objects have a peripheral distance of at least 10 cm and are placed height- wise, in a specific orientation, and within a bounded region
Manipulation Standalone
– The manipulator should grasp the object, given an ID, without colliding in atleast 12 out of 20 trials
Gaze Standalone
– When a non-keyword phrase is spoken and the user gazes at the object, the gaze should accurately identify intent in at least 16 out of 20 trials.
– Speech and gaze, when combined together should correctly identify intent in at least 12 out of 20 trials.
Bounding Box Size Error
– The bounding box error as per the stated criteria in FVE could be atmost 10%.
Performance Statistics

Fig.1 FVE Performance Statistics

Table 1. Performance Statistics