Food recognition

The Passio SDK has the ability the recognise anything from simple ingredients like blueberries and almonds to complex cooked dishes like a beef burrito with salad and fries.

There are two types of food recognition, each of them with their own strengths and weaknesses.

TypeWorks offlinePrecisionResponse time

No, the image is sent to the backend for recognition

Very precise with all types of foods

On average 4-7 seconds

Yes, the recognition is done on the device

Good with single food items, struggles with complex cooked foods

Depending on the hardware of the device, ranges between 50-300ms

The Remote image recognition approach is good when the accuracy of the results is top priority, and waiting for the response is not an issue. This use case is implemented by taking static images from the camera or the library of the device and sending them for recognition in an asynchronous fashion.

The Local model approach is good when speed is of the essence. This use case is implemented using continuous frame recognition from the camera. A callback is registered to capture the results as they are coming from the camera stream.

Remote image recognition

This API sends an image as base64 format to an LLM on Passio's backend, and returns a list of recognised items.

The API can recognise foods raw or prepared, barcodes or nutrition facts tables. The type of recognition will be shown in the resultType enum.

The default behaviour of this function is to resize the image to 512 pixels (longer dimension is resized, the other is calculated to keep aspect ratio). Using the PassioImageResolution enum, the image can be either resized to 512, 1080 or keep the original resolution.

let image = Bundle.main.path(forResource: "image1", ofType: "png")
PassioNutritionAI.shared.recognizeImageRemote(image: image) { passioAdvisorFoodInfo in
    print("Food Info:- \(passioAdvisorFoodInfo)")
}
public struct PassioAdvisorFoodInfo: Codable {
    public let recognisedName: String
    public let portionSize: String
    public let weightGrams: Double
    public let foodDataInfo: PassioFoodDataInfo
}

The response, presented as a list ofPassioAdvisorFoodInfo objects, contains the name, portion and weight in grams recognised by the LLM. These attributes can be used for debugging, but the data from the nutritional database is contained either in the foodDataInfo if the result type is a Food Item, or packagedFoodItem if it's a Barcode or Nutrition Facts. To fetch the nutritional data for the PassioFoodDataInfo object, use the fetchFoodItemForDataInfo function.

UI Example

  1. Create a screen where the user can snap one or multiple images using the camera of the device

  2. Upon clicking next, the recognizeImageRemote is invoked on each of the images in the list

  3. Wait for all of the responses to come, add each results list to a final list of results. When the last asynchronous function is executed, present the final list to the user.

Local neural network model

To set up the local model and the continuous scanning mode, the camera preview and the recognition session need to be defined.

Camera preview

To start using camera detection the app must first acquire the permission to open the camera from the user. This permission is not handled by the SDK.

Add the UI element that is responsible for rendering the camera frames:

var videoLayer: AVCaptureVideoPreviewLayer?

func setupPreviewLayer() {
    guard videoLayer == nil else { return }
    if let videoLayer = passioSDK.getPreviewLayer() {
        self.videoLayer = videoLayer
        videoLayer.frame = view.bounds
        view.layer.insertSublayer(videoLayer, at: 0)
    }
}

Start food detection

The SDK can detect 3 different categories: VISUAL, BARCODE and PACKAGED. The VISUAL recognition is powered by Passio's neural network and is used to recognize over 4000 food classes. BARCODE, as the name suggests, can be used to scan a barcode located on a branded food. Finally, PACKAGED can detect the name of a branded food. To choose one or more types of detection, a FoodDetectionConfiguration object is defined and the corresponding fields are set. The VISUAL recognition works automatically.

The type of food detection is defined by the FoodDetectionConfiguration object. To start the Food Recognition process a FoodRecognitionListener also has to be defined. The listener serves as a callback for all the different food detection processes defined by the FoodDetectionConfiguration. When the app is done with food detection, it should clear out the listener to avoid any unwanted UI updates.

Implement the delegate FoodRecognitionDelegate:

extension PassioQuickStartViewController: FoodRecognitionDelegate {
  func recognitionResults(candidates: FoodCandidates?,
                          image: UIImage?) {
        if let candidates = candidates?.barcodeCandidates,
           let candidate = candidates.first {
            print("Found barcode: \(candidate.value)")
        }
        
        if let candidates = candidates?.packagedFoodCandidates,
           let candidate = candidates.first {
            print("Found packaged food: \(candidate.packagedFoodCode)")
        }
        
        if let candidates = candidates?.detectedCandidates,
           let candidate = candidates.first {
            print("Found detected food: \(candidate.name)")
        }
  }
}

Add the method startFoodDetection()

func startFoodDetection() {
    setupPreviewLayer()
                
    let config = FoodDetectionConfiguration(detectVisual: true,
                                            volumeDetectionMode: .none,
                                            detectBarcodes: true,
                                            detectPackagedFood: true)
    passioSDK.startFoodDetection(detectionConfig: config,
                                 foodRecognitionDelegate: self) { ready in
        if !ready {
            print("SDK was not configured correctly")
        }
    }
}

In viewWillAppear request authorisation to use the camera and start the recognition:

override func viewWillAppear(_ animated: Bool) {
    super.viewWillAppear(animated)
    if AVCaptureDevice.authorizationStatus(for: .video) == .authorized {
        startFoodDetection()
    } else {
        AVCaptureDevice.requestAccess(for: .video) { (granted) in
            if granted {
                DispatchQueue.main.async {
                    self.startFoodDetection()
                }
            } else {
                print("The user didn't grant access to use camera")
            }
        }
    }
}

Stop Food Detection in viewWillDisappear:

override func viewWillDisappear(_ animated: Bool) {
    super.viewWillDisappear(animated)
    passioSDK.stopFoodDetection()
    videoLayer?.removeFromSuperlayer()
    videoLayer = nil
}

The FoodCandidates object that is returned in the recognition callbacks contains three lists:

  • detectedCandidates detailing the result of VISUAL detection

  • barcodeCandidates detailing the result of BARCODE detection

  • packagedFoodCandidates detailing the result of PACKAGED detection

Only the corresponding candidate lists will be populated (e.g. if you define detection types VISUAL and BARCODE, you will never receive a packagedFoodCandidates list in this callback).

Visual detection

A DetectedCandidate represents the result from running Passio's neural network, specialized for detecting foods like apples, salads, burgers etc. The properties of a detected candidate are:

  • name

  • passioID (unique identifier used to query the nutritional databse)

  • confidence (measure of how accurate is the candidate, ranges from 0 to 1)

  • boundingBox (a rectangle detailing the bounds of the recognised item within the image dimensions)

  • alternatives (list of alternative foods that are visually or contextually similar to the recognised food)

  • croppedImage (the image that the recognition was ran on)

To fetch the full nutrition data of a detected candidate use:

public func fetchFoodItemFor(passioID: PassioNutritionAISDK.PassioID, completion: @escaping (PassioNutritionAISDK.PassioFoodItem?) -> Void)

UI Example

  1. Implement the camera screen using the steps above

  2. Create a result view that can have two states: scanning and result

  3. If the callback returns an empty list, show the scanning state. If it returns the result, display the name from the detectedCandidate.name

Example of an image that produces a DetectedCandidate:

Last updated