System Implementation

Kiosk/Hardware System

To begin, we were provided a Franka Emika Panda mounted on top of a movable cart constructed for extruded aluminum pieces and their associated connectors. This existing structure, however, lacked the infrastructure needed to mount the overhead cameras, to house the ingredient bins, and to place the weighing scales. Additionally, after discussion with Professor Kroemer, we decided to elevate the work surface to be roughly at the arm’s shoulder joint height, as this allowed for the arm to have the best dexterity and reach. A rendering of this layout is shown in Figure 1a, with an assembled version shown in Figure 1b.

Figure 1a: SolidWorks Rendering of Kiosk Structure.

To house the ingredients, we are using 1/4 size cold drop pans. These allow us to store up to 6 ingredients and are still large enough to give the arm ample room to maneuver during the ingredient pick up process. To house these pans, as well as to support the sandwich assembly area, 1/4-inch acrylic sheets were cut to size. These sheets can be screwed onto the extruded aluminum structure.

Below the work surface, we have mounted two scales. These scales, under the ingredient bins, and under the sandwich assembly area, provide information on the current stock of ingredients, as well as the weight of the assembled sandwich. To accommodate the differing bin depths, we designed the structure shown in Figure 2a. This structure allows us to read the combined weight of the ingredients in all 3 bins. Figure 2b shows the scales placed under the work surface.

Figure 2a: SolidWorks Rendering of Ingredient Bin Support Structure.

Figure 2b: Weighing Scales Mounted Under Work Surface.

To manipulate the ingredients, we opted to use a high-flow suction system. This system, purchased from Piab, is capable of operating at 60 Psi, pulling 30 gal/min. With this high flow, the system is not only able to pick up meats and cheeses, but the relatively porous bread as well. A diagram of the complete pneumatic system is shown in Figure 3.

Figure 3: Pneumatic System for Ingredient Manipulation.

Sensing

Weighing Scale

To capture the current stock of ingredients, and to determine the amount of an ingredient in the sandwich assembly area, we purchased a digital weighing scale with gram level accuracy, an upper limit of 3kg, and an RS232 interface. When prompted, the scale returns the current weight measurement. A moving average filter is also used to filter out noise in the measurement process. A photo of the scale as well as example outputs are displayed in Figure 4.

Ingredient Pick-Up Pipeline

Before an ingredient can be picked up, the location of the ingredient must be determined in the arm’s base coordinate frame. To achieve this, weh have mounted an Intel RealSense D435 camera to the Franka Panda arm. To extract the XYZ pickup location, the first step is to segment the desired ingredient from the image. In our case, this means segmenting the ingredient that is on top of the stack.

To extract a mask that represents the top layer of an ingredient, we are utilizing at UNet deep learning model. UNet, which has the ability to directly generate the segmentation mask for the top slice in the stack, was trained on the augmented form of images collected from RealSense D435, consisting of varying lighting conditions, ingredient arrangements, and bin arrangements. After generating the top layer mask, a custom image-processing pipeline calculates its midpoint. Example UNet Results are shown in Figure 5.

Figure 5. Cheese Segmentation Using UNet.

To localize the bread, we opted to use classical computer vision techniques. Specifically, we applied a guassian filter, followed by edge detection. Contours were then filtered by area to leave only the desired bread contour. As bread is stacked up in a straight line, this approach leads to very robust outcomes. An example segmentation is shown in Figure 6.

Once the top ingredient is segmented, the central point is passed through the pipeline to be used as the XY pickup location. To get the remaining Z coordinate of the pickup point, the depth functionality of the D435 is used to generate a depth map, from which the Z location of the desired point is extracted. Figure 7 displays an example depth map, including the depth at a specified pick-up point. The X, Y, and Z locations parsed from this pipeline are then transformed into the arm’s base frame, which can then be used by the manipulation system to accurately pick up the ingredient.

Ingredient Placement and Check Pipeline

After placing the first slice of bread, it is desired that all subsequent ingredients are placed at the midpoint of this slice. Therefore, after placing the first slice, a photo of the assembly area is taken, from which the location of the bread can be extracted. To localize the bread, HSV thresholding is used. After this thresholding, a contour detector locates the edges of the bread, from which the center of the bread can be ascertained. This, coupled with the depth value of the bread, is then logged for later use when placing ingredients.

To certify that we are meeting our performance requirements, and to inform the decision making of the state machine, we introduced a feature to automate the process of validating ingredient placements. To begin, we used the same UNet and classical approaches that were used to find the ingredients in the bin to find them in the assembly area. The distance between the center point of the bottom bread to the center point of the ingredient mask is then used to determine if that specific ingredient is within tolerance (3 cm). An example output for both a slice of cheese and meat is shown in Figure 8.

Figure 8: Example Sandwich Check Output.

Manipulation

Ingredient Pick-Up and Placement

To pick up an ingredient, there are two operations that need to be performed. First, the arm must move into the pre-grasp position. To ensure that the arm is able to move into this pose collision free, trajectory following is used. These trajectories are generated using a minimum jerk joint trajectory solver. This solution works as our environment is relatively uncluttered.

Once above the desired bin, and the XYZ coordinate of the pick-up position is read from the sensing subsystem, the pick-up maneuver can begin. The arm first moves to an X, Y, and Z location immediately above the top of the bin, and directly above the pickup point. Next, a straight-line pose trajectory is calculated, using a trapezoidal velocity profile, with set maximum velocities and accelerations, used to determine the speed of the end effector. This desired trajectory is then tracked using an impedance controller with a low impedance in the Z direction, and the pneumatic system is enabled. The arm lowers in a straight line until it exerts a light force onto the ingredient, which is held firmly by the suction from the pneumatic system. The arm is then raised back up in a straight line to the original pre-grasp location, from which it can continue on to place the ingredient on the sandwich.

After picking up an ingredient, the arm follows a prerecorded trajectory to a point above the assembly area. The arm then descends, using an impedance controller, to the desired placement point specified by the vision subsystem, and ejects the ingredient.

One issue that we encountered, was a slight mismatch, between the desired height of the end effector, and the eventual position that the end effector reached. While this varied based on the location, this was generally a problem when descending lower than the elbow joint. Due to the precise nature of the pickup and placements, we used iterative learning control to negate this offset. During execution, the arm determines the distance between the desired and actual height and uses this to continually update an estimation of the offset parameter. The below equation governs this process, where alpha is the learning rate:

Collision Checking and Virtual E-Stop

Throughout the arm’s movement, it is important that the arm, as well as end effector, do not collide with the workspace or itself. To prevent any collisions, we have defined the workspace, including the ingredient bins, in our program. During any motion of the arm, there is constant collision checking, which triggers a stop in motion if any collision is detected.

Additionally, we desired to ensure that the arm would never move while operators or the customer were interacting with the workspace. To accomplish this, we have set a enable flag for all arm operations, that blocks access to all of the ROS actions to the arm while set.

Backend

Communication

Our system is built on top of the ROS2 Humble framework. The sensing, manipulation, and pneumatic subsystems publish their services and actions on this framework, which are then called by the state machine. This allows for extreme flexibility, and the modularization of the system.

State Machine

To coordinate the construction of a sandwich, we are using the state machine architecture. This state machine is implemented in ROS2, using the YASMIN package. The current state diagram for SNAAK is shown in Figure 9.

Not only does the state machine include the desired behavior assuming ideal results, but it is able to use sensor date from the camera and weighing scales, to detect errors in execution. After an error is detected, recovery behaviors are triggered to either alert the operator to the presence of an error or rectify the error automatically.

Restocking

In addition to controlling the flow of execution, the backend is responsible for updating the quantity of ingredients that the kiosk has in stock. This keeps the system aware of current ingredient quantities, so the operator can be informed when new ingredients need to be placed into the system. When this occurs, there is a restock mode, where the operator can simply place the new ingredients into the bin, and using weighing scales, the quantity of slices can then be measured.