Speech recognition

The recognizeSpeechRemote function is primarily used to fetch food items using a voice command, although it can be used to extract food items from a recipe in a text form as well.

The input to this function is just a string in free format, so the implementing side needs to do speech-to-text before using the API.

This function will be able to extract several pieces of data:

  • meal action, is this food being added or removed from the logs

  • meal time (breakfast, dinner, lunch or snack)

  • date of log

  • recognised name from the LLM

  • portion and weight from the LLM

  • nutritional data reference as PassioFoodDataInfo

Example of logging a morning breakfast:

let speech = "I had some scrambled egg whites, turkey bacon, whole grain toast, and a black coffee for breakfast"
PassioNutritionAI.shared.recognizeSpeechRemote(from: speech) { recognitionResult in
    print("Result:- \(recognitionResult)")
}

public struct PassioSpeechRecognitionModel {
    public let action: PassioLogAction?
    public let meal: PassioMealTime?
    public let date: String!
    public let advisorFoodInfo: PassioAdvisorFoodInfo
}

public enum PassioLogAction: String, Codable, CaseIterable {
    case add
    case remove
    case none
}

The SDK doesn't have the functionality to record the voice session, that has to be handled by the app.

UI Example

  1. Create a screen where the user can record the voice logging command. Make sure to add the appropriate permissions.

  2. The UI should enable the user to start/stop voice recording on a tap of a button. When the voice recording is done collect the string and use the SDK to extract food.

  3. Once the SDK returns the results, show the list to the user with the option to deselect incorrect predictions.

Last updated