Photographs can be used to detect and identify objects in the physical world by performing a visual search (a search query that uses an image as input). Using machine learning models, visual search results can tell users more information about an item – whether it’s a species of plant or an item to purchase.
Design for this feature is based on the following principles: Align the camera UI components to the top and bottom edges of the screen, ensuring...
Design for this feature is based on the following principles:
Keep images clear and legible
Align the camera UI components to the top and bottom edges of the screen, ensuring that text and icons remain legible when placed in front of an image.
Any non-actionable elements displayed in front of the live camera feed should be translucent to minimize obstructing the camera.
Using an image to search for objects introduces unique usage requirements. Overlapping or cropped objects can make it hard to identify an object.
Error states should be communicated with multiple design cues (such as components and motion) and include explanations of how users can improve their search.
The static image object detection features uses existing Material Design components and new elements specific to interacting with an image. For code samples and demos of new elements (such as object markers), check out the source code for the ML Kit Material Design showcase app on Android.
Key elements across the stages of a static image visual search experience:
1. Top app bar
2. Object marker
5. Detected image
6. Modal bottom sheet
Top app bar
The top app bar displays information and actions relating to the current view. Related Link arrow_downward The top app bar provides persistent access to the...
The top app bar provides persistent access to the following actions:
- A button to exit the search experience
- A button to submit a photo to search
- A Help section for troubleshooting search issues
Object markers are circular, elevated indicators placed in front of the center of a detected object. Each marker is paired with a card at the...
Object markers are circular, elevated indicators placed in front of the center of a detected object. Each marker is paired with a card at the bottom of the screen, which displays a preview of each object’s results. When the card is scrolled into view, the corresponding object marker increases in size.
Tapping an object marker (or its results card) opens a modal bottom sheet displaying an object’s full visual search results.
Tooltips display informative text when users hover over, focus on, or tap an element. Related Link arrow_downward Tooltips display informative text to users. For example,...
Tooltips display informative text to users. For example, they express both states (such as with a message that says “Searching…”) and prompt the user to the next step (such as a message that says, “Tap on a dot or card for results”).
Cards contain content and links about a single subject. Related Link arrow_downward Cards provide a preview of an object’s visual search results. They are arranged...
Cards provide a preview of an object’s visual search results. They are arranged in a horizontally scrolling carousel, organized based on the horizontal position of each object.
Each card is paired with an object marker. When the card is scrolled into view, its related object marker increases in size. Tapping a card (or its object marker) opens a modal bottom sheet, which displays an object’s full visual search results.
Cards provide a preview of visual search results and can be tapped to open a modal bottom sheet that contains all results. Horizontally scrolling cards emphasize the corresponding object marker.
Modal bottom sheet
Bottom sheets are surfaces containing supplementary content that are anchored to the bottom of the screen. Related Link arrow_downward Modal bottom sheets provide access to...
Modal bottom sheets provide access to visual search results. Their layout and content depend on your app’s use case, the number of results, and result confidence.
Visual object search from an image happens in three phases:
- Input: Select an image to search
- Recognize: Detect and identify objects
- Communicate: If matching objects are found, display results
AI-powered systems can adapt over time. Prepare users for change—and help them understand how to train the system. Related Link arrow_downward Visual search begins when...
Visual search begins when a user selects an image. To increase the chances of a successful search, advise users on the types of images most suitable to search.
When one or more objects have been detected from an image, the app should: Objects detected by ML Kit Object Detection & Tracking API are...
When one or more objects have been detected from an image, the app should:
- Communicate that the app is awaiting results
- Display search progress
Objects detected by ML Kit Object Detection & Tracking API are then compared against a set of known images from your image classification model, which are used to find matching results.
Even if an image is detected from a photo, it doesn’t guarantee that matching results will be found. Thus, objects shouldn’t be marked as detected until valid search results are returned.
The following factors can affect whether or not objects are detected and identified (this list is not exhaustive):
- Poor image quality
- Small object size in image
- Low contrast between an object and its background
- An object is shown from an unrecognizable angle
- The network connection needed to complete the search is lost
Explaining predictions, recommendations, and other AI output to users is critical for building trust. Related Link arrow_downward Results for detected objects are expressed to users...
Results for detected objects are expressed to users by:
- Placing object markers in front of each detected object
- Showing a preview each object’s result on a card (as part of a carousel of cards)
Your app should set a confidence threshold for displaying visual search results. “Confidence” refers to an ML model’s evaluation of how accurate a prediction is. For visual search, the confidence level of each result indicates how similar the model believes it is to the provided image.
If one or more objects in the image have search results, the app should identify those detected objects using object markers and a carousel of cards previewing each object’s results. Tapping on a marker or card opens a modal bottom sheet that shows an object’s results.
Evaluating search results
In some cases, visual search results may not meet user expectations, such as in the following scenarios:
No results found
A search can return without matches for several reasons, including:
- An object isn’t a part of, or similar to, the known set of objects
- It was detected from an angle the visual search model doesn’t recognize
- Poor image quality, making key details of the object hard to recognize
Display a banner to explain if there are no results and guide users to a Help section for information on how to improve their search.
If a search returns results with only low-confidence scores, you can ask the user to search again (with tips on improving their search) instead of showing results.
Shrine Material theme
Shrine is a lifestyle and fashion brand that demonstrates how Material Design can be used in e-commerce. Related Link arrow_downward The Shrine app purchase flow...
The Shrine app purchase flow lets users perform a visual search for products using a photo.
Shrine’s object markers use a diamond shape to reflect Shrine’s shape style (which uses angled cuts).
To help users match result cards with possible detected objects, object markers typically increase in size when their corresponding result card is selected in the carousel. Instead of changing the object marker’s size to emphasize it, Shrine applies custom color and border styles.
Shrine’s result cards use custom colors, typography, and shape styles.