Subscribe
Commerce Experience

Data Science at Bloomreach

By Samit Paul

08/10/2023


Over the past decade, Bloomreach has been at the forefront of delivering a range of products and solutions that use AI-driven technologies to enhance ecommerce experiences. These offerings fall under two primary products: Bloomreach Discovery and Bloomreach Engagement

Bloomreach Discovery is all about empowering AI-driven product discovery — including our industry-leading ecommerce search solution — to yield the fastest return on investment. Bloomreach Engagement, on the other hand, offers the power to create highly personalized, real-time marketing campaigns that deliver tangible outcomes. It stands as the sole marketing platform that seamlessly integrates customer data, omnichannel automation, AI, and analytics into one cohesive solution.

And driving the innovations behind these two main solutions is the Bloomreach Data Science team. 

The Role of Data Science at Bloomreach

The Bloomreach Data Science team is known for tackling some of the most challenging problems in the ecommerce industry. At the core of our mission lies a vision to understand the ecommerce landscape and enhance our customers' businesses. With a wealth of data accumulated over the years from our in-house database, we’ve established ourselves as industry leaders. One such area is using machine learning (ML) for product search rankings. By leveraging advanced ML techniques, the team optimizes the search algorithms to provide accurate and relevant search results, improving the user experience and conversion rates. 

The team's efforts also extend beyond search ranking improvements. To enhance customization, we’re incorporating merchant-specific signals to further optimize the search experience. We’re specifying objectives like revenue or profit to optimize on using a learning to rank framework. Additionally, the team works on machine learning-based attribute detection from product catalogs, automating the process of extracting key product attributes to enhance our in-house product knowledge base. Being a B2B SaaS provider, we also develop the solution in a merchant-agnostic manner, so that it can handle various scenarios, including low vs. high traffic merchants, small vs. large catalog bases, and diverse geographies and languages.

Recommender systems are another focus area, using personalized algorithms to suggest products based on user preferences, browsing behavior, purchase history, and product metadata. The team is developing cutting-edge models specifically designed to improve customers' shopping experiences at different stages, ensuring a truly personalized journey. Whether a customer has just landed on the homepage or is already in the middle of making a purchase, the team ensures that the most relevant products are displayed to cater to their needs. Furthermore, the team is creating models that will effectively target customers through other channels like email campaigns, with the aim of improving engagement and increased repeat customer rate.

The Bloomreach Data Science Tech Stack

To stay ahead in the rapidly evolving field of data science, the Bloomreach team utilizes a robust tech stack featuring both AWS SageMaker and GCP Vertex AI. Python, a versatile programming language, serves as the primary language for data manipulation, modeling, and analysis. The ML engineers are well-versed with TensorFlow, PyTorch, and PyTorch Lightning for distributed model training. We leverage Apache Airflow for orchestrating our batch ML pipelines, while deploying real-time serving APIs through some of the most reliable and scalable options available in the market depending on the project requirements. In order to meet SLAs, we leverage techniques like quantization to reduce the model size and improve the model scoring time. We use ONNX for large-scale model deployment and Tensorboard for model training monitoring. The ML engineers also employ frameworks like Spark to develop distributed ETL pipelines.

The team leverages various state-of-the-art deep learning architectures, such as Transformers (e.g., BERT, Vision Transformers), to tackle complex problems related to textual and image data. In the case of product search, we leverage both classical natural language processing (NLP) and information retrieval techniques, as well as state-of-the art vector representation techniques for both textual and image data. To perform the lookup of these vectors in an efficient manner, the team uses both in-memory (e.g., FAISS for batch inference pipelines) and vector databases (for real-time serving). 

In order to scale our solution across geographies, we develop multilingual machine learning models to support product catalogs for non-English languages. On the recommender systems front, we leverage multiple techniques leveraging both behavioral and product metadata — collaborative filtering, content-based similarity, and hybrid recommender systems to combine both collaborative- and content-based approaches. We’ve developed both classical (e.g., matrix factorization) and deep learning-based techniques (BERT, two-tower neural network, etc.) to solve these problems. We also developed contextual personalization features to better understand our customer’s needs and preferences utilizing the multi-armed bandit technique. 

Alongside generating precomputed recommendations offline, the team has also developed real-time personalization capabilities by leveraging user interactions to capture individualized user affinities. Lastly, we focus on continuous experimentation of the models we develop. A/B testing is essential in the ecommerce domain, and it allows businesses to compare different versions or variations of algorithms to determine which one yields better results in terms of key metrics like click-through rate (CTR), revenue per visit (RPV), and average order value (AOV). We utilize both classical statistical and Bayesian approaches to analyze the test results. The team also works very closely with the respective engineering teams to productionize various data science models.

What is Generative AI?

Looking Ahead to the Future

As pioneers in the field, the Bloomreach data science team constantly pushes the boundaries of innovation. We actively explore future-looking problems such as conversational commerce and visual search by leveraging the latest breakthrough in deep learning-based large foundation models and generative AI.

Conversational commerce aims to enhance the shopping experience by integrating NLP and machine learning techniques, allowing users to interact with chatbots or virtual assistants to make purchases or seek product recommendations.

Visual search involves developing algorithms that can understand and interpret images, enabling users to search for products by uploading images instead of using keywords. Object detection techniques like DETR (DEtection TRansformer) is being used as part of our visual search exploration to identify fashion-related objects from a picture, which can help with searching for visually similar clothing. We are also experimenting with generative AI to build virtual try-on capabilities for our fashion and home decor merchants. This involves expanding the boundaries of image segmentation, pose prediction, pose transfer, and image generation. We’re experimenting with generative adversarial networks (GANs) and diffusion models for these tasks.

Powering Data Science Innovation at Bloomreach

At the core of the Bloomreach Data Science team are machine learning engineers who have graduated from some of the world's most prestigious colleges and universities. These experts possess a strong foundation in machine learning algorithms, statistical modeling, and data analysis techniques. Their expertise enables them to develop ML-driven advanced AI models and algorithms that power Bloomreach's products and solutions. 

The Bloomreach Data Science team is truly diverse, with members located in the United States, Europe, and India. This geographical spread brings together a range of perspectives, experiences, and cultural backgrounds, fostering creativity and innovation within the team. Such diversity ensures that the team can tackle problems from different angles and provide comprehensive solutions. The team represents the epitome of innovation and expertise, and with a global network of talented professionals, top-notch machine learning engineers, and a cutting-edge tech stack, they are at the forefront of solving complex problems in the ecommerce industry. From developing ML-based ranking algorithms to exploring futuristic challenges like visual search and conversation commerce, the team's contributions continue to shape the future of online shopping. 

With their dedication and passion, Bloomreach is well-positioned to remain a leader in the ecommerce space, delivering personalized and seamless experiences to customers worldwide. If you’d like to stay in the know of how we’re working with AI to shape the future of ecommerce, watch The Edge Summit on-demand to learn how AI is changing ecommerce forever.

Found this useful? Subscribe to our newsletter or share it.


Samit Paul

Head of Data Sciences

Samit a distinguished professional who leads the Data Science team at Bloomreach. With a strong foundation in research gained from his early career at General Electric Global Research, Samit has honed his expertise in developing statistical and machine learning models across diverse business segments. Following his tenure at GE, he has contributed his talents to renowned companies such as Intuit, American Express, and Yodlee. In these roles, Samit played a pivotal role in spearheading the development of several AI-driven scalable practical solutions, fostering a culture of innovation guided by the latest AI trends. His extensive experience and visionary leadership continue to drive the Data Science team at Bloomreach to deliver impactful AI-driven innovations.

Subscribe for Insights

Stay ahead in ecommerce and AI with The Edge, a bi‑weekly newsletter featuring the latest insights on ecommerce topics, trends, and innovations. Subscribe to get our hot takes delivered to straight your inbox.