Kiosk/Hardware System
To begin, we were provided a Franka Emika Panda mounted on top of a movable cart constructed for extruded aluminum pieces and their associated connectors. This existing structure, however, lacked the infrastructure needed to mount the overhead cameras, to house the ingredient bins, and to place the weighing scales. Additionally, after discussion with Professor Kroemer, we decided to elevate the work surface to be roughly at the arm’s shoulder joint height, as this allowed for the arm to have the best dexterity and reach. A rendering of this layout is shown in Figure 1a, with an assembled version shown in Figure 1b.


To house the ingredients, we are using 1/4 size cold drop pans. These allow us to store up to 6 ingredients and are still large enough to give the arm ample room to maneuver during the ingredient pick up process. To house these pans, as well as to support the sandwich assembly area, 1/4-inch acrylic sheets were cut to size. These sheets can be screwed onto the extruded aluminum structure.
Below the work surface, we have mounted two scales. These scales, under the ingredient bins, and under the sandwich assembly area, provide information on the current stock of ingredients, as well as the weight of the assembled sandwich. To accommodate the differing bin depths, we designed the structure shown in Figure 2a. This structure allows us to read the combined weight of the ingredients in all 3 bins. Figure 2b shows the scales placed under the work surface.


Sensing
Weighing Scale
To capture the current stock of ingredients, and to determine the amount of an ingredient in the sandwich assembly area, we purchased a digital weighing scale with gram level accuracy, an upper limit of 3kg, and an RS232 interface. When prompted, the scale returns the current weight measurement. A moving average filter is also used to filter out noise in the measurement process. A photo of the scale as well as example outputs are displayed in Figure 3.

Ingredient Pick-Up Pipeline
Before an ingredient can be picked up, the location of the ingredient must be determined in the arm’s base coordinate frame. To achieve this, weh have mounted an Intel RealSense D435 camera to the Franka Panda arm. To extract the XYZ pickup location, the first step is to segment the desired ingredient from the image. In our case, this means segmenting the ingredient that is on top of the stack.
To extract a mask that represents the top layer of an ingredient, we are utilizing at UNet deep learning model. UNet, which has the ability to directly generate the segmentation mask for the top slice in the stack, was trained on the augmented form of images collected from RealSense D435, consisting of varying lighting conditions, ingredient arrangements, and bin arrangements. After generating the top layer mask, a custom image-processing pipeline calculates its midpoint. Example UNet Results are shown in Figure 4.

Once the top ingredient is segmented, the central point is passed through the pipeline to be used as the XY pickup location. To get the remaining Z coordinate of the pickup point, the depth functionality of the D435 is used to generate a depth map, from which the Z location of the desired point is extracted. Figure 5 displays an example depth map, including the depth at a specified pick-up point. The X, Y, and Z locations parsed from this pipeline are then transformed into the arm’s base frame, which can then be used by the manipulation system to accurately pick up the ingredient.

Ingredient Placement Pipeline
After placing the first slice of bread, it is desired that all subsequent ingredients are placed at the midpoint of this slice. Therefore, after placing the first slice, a photo of the assembly area is taken, from which the location of the bread can be extracted. To localize the bread, HSV thresholding is used. After this thresholding, a contour detector locates the edges of the bread, from which the center of the bread can be ascertained. This, coupled with the depth value of the bread, is then logged for later use when placing ingredients.
Manipulation
Ingredient Pick-Up and Placement
To pick up an ingredient, there are two operations that need to be performed. First, the arm must move into the pre-grasp position. To ensure that the arm is able to move into this pose collision free, trajectory following is used. These trajectories are generated using a minimum jerk joint trajectory solver. This solution works as our environment is relatively uncluttered.
Once above the desired bin, and the XYZ coordinate of the pick-up position is read from the sensing subsystem, the pick-up maneuver can begin. The arm first moves to an X, Y, and Z location immediately above the top of the bin, and directly above the pickup point. Next, a straight-line pose trajectory is calculated, using a trapezoidal velocity profile, with set maximum velocities and accelerations, used to determine the speed of the end effector. This desired trajectory is then tracked using an impedance controller with a low impedance in the Z direction, and the pneumatic system is enabled. The arm lowers in a straight line until it exerts a light force onto the ingredient, which is held firmly by the suction from the pneumatic system. The arm is then raised back up in a straight line to the original pre-grasp location, from which it can continue on to place the ingredient on the sandwich.
After picking up an ingredient, the arm follows a prerecorded trajectory to a point above the assembly area. The arm then descends, using an impedance controller, to the desired placement point specified by the vision subsystem, and ejects the ingredient.
Collision Checking
Throughout the arm’s movement, it is important that the arm, as well as end effector, do not collide with the workspace or itself. To prevent any collisions, we have defined the workspace, including the ingredient bins, in our program. During any motion of the arm, there is constant collision checking, which triggers a stop in motion if any collision is detected.
Backend
State Machine
To coordinate the construction of a sandwich, we are using the state machine architecture. This state machine is implemented in ROS, using the YASMIN package. The current state diagram for SNAAK is shown in Figure 6.

Not only does the state machine include the desired behavior assuming ideal results, but it is able to use sensor date from the camera and weighing scales, to detect errors in execution. After an error is detected, recovery behaviors are triggered to either alert the operator to the presence of an error, or rectify the error automatically.