The Passio SDK has the ability the recognise anything from simple ingredients like blueberries and almonds to complex cooked dishes like a beef burrito with salad and fries.
There are two types of food recognition, each of them with their own strengths and weaknesses.
No, the image is sent to the backend for recognition
Very precise with all types of foods
On average 4-7 seconds
Yes, the recognition is done on the device
Good with single food items, struggles with complex cooked foods
Depending on the hardware of the device, ranges between 50-300ms
The Remote image recognition approach is good when the accuracy of the results is top priority, and waiting for the response is not an issue. This use case is implemented by taking static images from the camera or the library of the device and sending them for recognition in an asynchronous fashion.
The Local model approach is good when speed is of the essence. This use case is implemented using continuous frame recognition from the camera. A callback is registered to capture the results as they are coming from the camera stream.
This API sends an image as base64 format to an LLM on Passio's backend, and returns a list of recognised items. The default behaviour of this function is to resize the image to 512 pixels (longer dimension is resized, the other is calculated to keep aspect ratio). Using the PassioImageResolution
enum, the image can be either resized to 512, 1080 or keep the original resolution.
The response, presented as a list ofPassioAdvisorFoodInfo
objects, contains the name, portion and weight in grams recognised by the LLM, as well as a PassioFoodDataInfo
reference from Passio's nutritional database. To fetch the nutritional data for the PassioFoodDataInfo object, use the fetchFoodItemForDataInfo
function.
To match the portion size of the PassioAdvisorFoodInfo object and the fetched PassioFoodItem, pass the weightGrams to the fetchFoodItemForDataInfo method.
Also, the SDK has the ability to send 7 concurrent image request to the backend, so multiple images can be processed in parallel.
Create a screen where the user can snap one or multiple images using the camera of the device
Upon clicking next, the recognizeImageRemote
is invoked on each of the images in the list
Wait for all of the responses to come, add each results list to a final list of results. When the last asynchronous function is executed, present the final list to the user.
To set up the local model and the continuous scanning mode, the camera preview and the recognition session need to be defined.
To start using camera detection the app must first acquire the permission to open the camera from the user. This permission is not handled by the SDK.
Add the UI element that is responsible for rendering the camera frames:
To start using the camera in your Activity/Fragment, implement the PassioCameraViewProvider interface. By implementing this interface the SDK will use that component as the lifecycle owner of the camera (when that component calls onPause() the camera will stop) and also will provide the Context in order for the camera to start. The component implementing the interface must have a PreviewView in its view hierarchy.
Start by adding the PreviewView to your view hierarchy. Go to your layout.xml and add the following.
This approach is more manual but gives you more flexibility. You need to implement the PassioCameraViewProvider interface and supply the needed LifecycleOwner and the PreviewView added in the initial step.
After the user has granted permission to use the camera, start the SDK camera
PassioCameraFragment is an abstract class that handles Camera permission at runtime as well as starting the Camera process of the SDK. To use the PassioCameraFragment simply extend it in your own fragment and supply the PreviewView that has been added to the view hierarchy in the previous step.
To show the live camera preview, add the DetectionCameraView
to your view
The SDK can detect 3 different categories: VISUAL, BARCODE and PACKAGED. The VISUAL recognition is powered by Passio's neural network and is used to recognize over 4000 food classes. BARCODE, as the name suggests, can be used to scan a barcode located on a branded food. Finally, PACKAGED can detect the name of a branded food. To choose one or more types of detection, a FoodDetectionConfiguration object is defined and the corresponding fields are set. The VISUAL recognition works automatically.
The type of food detection is defined by the FoodDetectionConfiguration
object. To start the Food Recognition process a FoodRecognitionListener also has to be defined. The listener serves as a callback for all the different food detection processes defined by the FoodDetectionConfiguration. When the app is done with food detection, it should clear out the listener to avoid any unwanted UI updates.
Implement the delegate FoodRecognitionDelegate
:
Add the method startFoodDetection()
In viewWillAppear
request authorisation to use the camera and start the recognition:
Stop Food Detection in viewWillDisappear
:
Using the listener and the detection options start the food detection by calling the startFoodDetection method of the SDK.
Stop the food recognition in the onStop() lifecycle callback.
Add the method startFoodDetection()
and register a FoodRecognitionListener
Stop Food Detection on widget dispose:
The FoodCandidates
object that is returned in the recognition callbacks contains three lists:
detectedCandidates
detailing the result of VISUAL detection
barcodeCandidates
detailing the result of BARCODE detection
packagedFoodCandidates
detailing the result of PACKAGED detection
Only the corresponding candidate lists will be populated (e.g. if you define detection types VISUAL and BARCODE, you will never receive a packagedFoodCandidates list in this callback).
A DetectedCandidate represents the result from running Passio's neural network, specialized for detecting foods like apples, salads, burgers etc. The properties of a detected candidate are:
name
passioID (unique identifier used to query the nutritional databse)
confidence (measure of how accurate is the candidate, ranges from 0 to 1)
boundingBox (a rectangle detailing the bounds of the recognised item within the image dimensions)
alternatives (list of alternative foods that are visually or contextually similar to the recognised food)
croppedImage (the image that the recognition was ran on)
To fetch the full nutrition data of a detected candidate use:
Implement the camera screen using the steps above
Create a result view that can have two states: scanning and result
If the callback returns an empty list, show the scanning state. If it returns the result, display the name from the detectedCandidate.name
Example of an image that produces a DetectedCandidate: