Building the Best Semantic Search Engine With Loomi
By Shinhe Cho
At Bloomreach, we define “semantic search” as a search engine built with semantic understanding. This is beyond natural language processing (NLP), which is the ability of machines to understand text the way humans can. Semantic understanding is the ability to parse each word of a search query into its respective attribute and also make sense of ambiguous words. For example, for the query "shirt dress," "shirt" is the style, and "dress" is the product type. Conversely, for the query "dress shirt," "dress" is the style, and "shirt" is the product type.
Now, the reality is that the market is full of noise on this topic. If you’re doing a Google search for “semantic search,” you’re likely looking for a new site search solution. But what you’ll find on the results page is a litany of vendors who tell you they have semantic search while trying to sell you their products. How are buyers expected to properly evaluate site search solutions without visibility into the search engine itself?
That’s the problem we’re solving for today.
Bloomreach has been a market leader in product discovery solutions for e-commerce for over a decade. We’re proud to have some of the largest enterprise customers in retail and distribution, with a growing share of mid-market customers to boot. Our customers stick with us year after year because our powerful semantic search engine lets us deliver real value in revenue dollars. But how does it actually work? And what’s really inside the engine? Read along to "go under the hood" and learn more about what really powers semantic search at Bloomreach.
How Semantic Search Works
Let’s start with a quick overview of how semantic search engines work. At its basic level, there are two functions of search: retrieval and ranking.
- Retrieval — The process of finding the set of products in a catalog that matches what the user is searching for
- Ranking — The process of ordering the retrieved products in a particular sequence
The best search engines will optimize for both retrieval and ranking when trying to determine search intent. First, the search engine retrieves the most relevant products for a user query. Then, the products are ranked in such a way that meets customer and business needs.
Now, let’s dive a bit deeper into these components.
Retrieval is generally evaluated along two dimensions: recall and precision.
- Recall — This measures how well the recall set is delivered against the user query. Generally, it’s the number of relevant products retrieved divided by the total number of relevant products available. This metric does not penalize irrelevant products.
- Precision — This measures how precise that recall set is for user intent. It’s the number of relevant products retrieved divided by the total number of retrieved products. There’s a fine balance to consider with precision. For example, if you have 100 relevant products and your search delivers 10 of them, you may have 100% precision, but this wouldn’t be ideal from a user experience perspective.
Ranking is based on a number of signals — customer rules (i.e., boosting a specific brand), product performance, what the algorithms are set up to optimize for, etc. These signals lead to product scores that then determine the order of products.
Other search engines in the market today focus on various aspects of the search process. Some search vendors will apply basic algorithms that impact ranking (i.e., ranking optimization), which changes search results based on user behavior. Others are heavily rule-based or keyword-based and focus on retrieval. These solutions typically have no actual query understanding or built-in intelligence. A team of developers at your company will have to heavily curate and manually tune for things like synonyms, misspellings, assortment gaps, etc.
In both of the examples above, the artificial intelligence (AI) investment is in one part of the search process or one type of AI. This is like painting in a single color — so for example, while blue is a very popular color with a lot of versatility, if you only paint with blue, you're going to limit what you can achieve on a canvas.
In contrast, using a broad set of colors (or algorithms) and applying them where they make the most sense allows you to achieve at a higher level. This is what Bloomreach has done with our core engine, and it’s how we’ve maintained our market-leading position in commerce search.
How Bloomreach Does Semantic Search
lBlooomreach’s search experts have set the standard for driving search revenue through a combination of intelligent algorithms and features that touch every part of the search process, with true semantic understanding at the center.
Let’s start from the beginning with three key inputs that impact our search engine’s performance:
- Customer inputs — Think of your business’ product data, merchandising rules, business priorities, etc.
- Bloomreach algorithms — Our proprietary AI-driven algorithms that we’ve developed over years of commerce experience and continually optimize
- Other inputs — This is data that we’re able to collect from your customers’ buying patterns and your products’ performance on your site
All three inputs are critical in ensuring an optimal search experience for your customers — but the second input, Bloomreach’s algorithms, is what sets our solution apart from the rest.
In particular, our customers start with day zero learnings. This means that even before a user starts typing a query into the search bar, you’ve already benefited from years of commerce-specific data that’s incorporated into our algorithms and informs our search retrieval and ranking processes. There’s no waiting for a pixel to collect data or for your merchandisers to create rules. Our semantic engine is already parsing out attributes and applying known synonyms due to Bloomreach’s vast commerce dataset.
Two Modes of Retrieval Using Loomi
Our many years in digital commerce have taught us that there’s a fine line between balancing recall and precision.
On the one hand, you always want to improve your recall set (i.e., the list of products you show a customer after they hit “enter”) by serving up the largest number of relevant results possible. On the other hand, you want to make sure each of those products is actually relevant to the customer’s search.
There are many reasons why a business would prioritize recall over precision, and vice versa. This is why we’ve created two distinct modes of retrieval that our customers can apply at the query level based on their unique business goals.
Bloomreach’s default retrieval mode prioritizes better recall. This mode takes your product data and applies specific algorithms to enhance the recall set. It all starts with:
Semantic understanding — We apply semantic understanding from the beginning of the retrieval process by understanding customer intent and parsing product types and attributes from two sources:
Search query, or what the customer types into the search bar
Product catalog, or the data provided in your product catalog
Loomi, Bloomreach’s AI for e-commerce, enhances our semantic understanding capabilities for search and product catalogs. But Loomi is more than just a tool. What sets Loomi apart is its ability to adapt and learn — it’s data-savvy, proactive, and eager to illuminate actionable insights. Our Loomi-led algorithm controls help you make the most out of your search experience:
Spell correct — Loomi is triggered when the original query has zero results but results exist for a similar query. The two algorithms are as follows:
Term frequency — This is the default mode and considers a term that appears more frequently in your catalog as the likely candidate for spell correction
Closest match — This uses “edit distance,” or the minimum distance between two sets of letters or numbers to get from one term to another, to determine the candidate for spell correction
Query relaxation — Loomi is activated once there are no exact matches found from the user’s initial search query. Semantic understanding recognizes the product type of the query and relaxes the query matching criteria from “match on all terms” to “match on one term” (e.g., the product type), thus reducing null results.
Bloomreach’s precision mode starts with Loomi’s default mode and adds on layers of algorithms to prioritize better precision in the recall set, specifically:
Search recall precision — Loomi helps remove noisy product data from search results
Product type precision — Utilizing the product type uncovered by our semantic understanding, Loomi identifies a set of product types that should be retained in the recall set for every query. For example, the recall set for “black shoes” will include:
All products that match product type "shoes"
The product type “boots” is extracted from the synonym rule "shoes → boots"
The product types "heels" and "pumps" identified from user data
Category precision — Loomi also filters the recall set based on product type match and dominant category. Dominant categories are determined by the categories that the top products in the recall set belong to.
Facet precision — Finally, Loomi targets and removes facet noise in search results by relying on the products in the dominant categories of that query. For example, the top 50 products for the search query “dress” may belong to the “evening dress,” “maxi dress,” and “cocktail dress” categories. Perhaps the 78th product is a pair of dress shoes that fall under the “men’s shoes” and “women’s shoes” categories. With facet precision turned on, “men’s shoes” and “women’s shoes” would not appear as a facet in the recall set.
Ranking Optimized for RPV
The ordering of the recall set, or ranking, is a critical part of the search process that aligns customer experience with your business goals. We know that users on your site are expecting a personalized search experience that delivers the results they want with each search. We also know that merchandisers are juggling multiple priorities around inventory, promotions, brand requests, conversion goals, and more. Bloomreach’s ranking process takes these variables into account to produce numerical scores that help you understand how and why products are ranked the way they are.
First, the signals:
- Customer rules (hard boost/bury) — You may want all winter clothing boosted to the top row for every search query. Or, you may want anything that’s nearly out of stock pushed down to the bottom of the results page. Bloomreach takes in those signals set by our customers and ranks products accordingly.
- Product performance and global performance — How well your products perform (product views, add-to-carts, conversions, and revenue) for given search queries is taken into account in our optimized ranking. Global performance consists of how well your products perform sitewide, regardless of search query.
- Personalization signals (1:1 and segment-based personalization) — Each visitor has a unique profile that gets updated in real time based on their on-site search, browse, and purchase behavior. These signals can be taken 1:1 or based on a user’s segment.
- Semantic understanding — Our underlying semantic understanding ranks the most relevant products at the top of the recall set by boosting products that match both product type and product attributes.
- SKU-level intelligence — By collecting user behavior data at the SKU level, and not just at the product level, we can respond quickly to variant changes. For example, if a popular SKU goes out of stock, the corresponding ranking for the overall product can be lowered to account for the lack of availability.
These signals then produce several scores, normalized from 1-100, that allow business users to quickly and efficiently understand the ranking algorithm at work and make adjustments as needed.
- The performance score — Based on the product performance and global performance signals, a “performance score” assesses how well a product has performed. Recent performance data is valued higher.
- The relevance score — Powered by our semantic understanding, the relevance score measures the match of the query to the product.
- The personalization score — With real-time 1:1 personalization, each visitor’s usage patterns are used to compute a personalization score that indicates a strong pattern. For example, if a user always engages with men’s products, then men’s products in the recall set will have a higher personalization score and those products will be ranked higher.
- The merchandising score — For customers using Bloomreach for merchandising, a merchandising score is attached to a product based on boost/bury rules. When you boost a product, Bloomreach increases the product's score, pushing it closer to the beginning of search results. Bloomreach doesn't ignore the product's performance data — your boost or bury rule is just another signal that’s used by the search algorithms to determine the final order of products in the grid.
From the beginning of a search query to the ongoing ranking optimization happening with each additional input, your search experience with Bloomreach is truly complete.
Further Developing The Best Semantic Search Engine
But that’s not all.
Beyond the core engine, our team of search experts has continued to develop features that solve the most common use cases for our customers. Read through the additional problems we’ve solved in search:
- Partial Part Number Search
- Automatic Query Filtering
- Relevance by Segment
- SKU Select
Let Bloomreach Drive Your Semantic Search
While we’re proud of the engine we’ve built over the last decade and the AI we've trained specifically for e-commerce, we’re also excited about what’s to come at Bloomreach.
We’re thrilled to continue applying the latest machine learning and natural language processing technologies into our core engine and making Loomi smarter, especially when it comes to serving up relevant search results. These developments will help our customers keep creating magical moments in e-commerce while maintaining a competitive edge in a crowded marketplace.
If you’re looking to uncover missed opportunities in search revenue and reduce manual tuning, you need a solution that’s built for commerce and powered by the most intelligent algorithms on the market today, like Loomi. Schedule a personalized demo of Bloomreach Discovery for the best-of-breed tools in search, merchandising, recommendations, and search engine optimization — all in one unified solution.