Nowadays the recommendation systems have been widely used in all platforms, especially the e-commerce platforms. Both Amazon in America and Taobao in China have developed their own recommendation system algorithms to find association rules between the customer’s previous consuming behaviors and later purchase and therefore can make a better prediction about that customer’s later purchase. These improved recommendation systems overcome the problems of the previous recommendation system models and can better suit the big data environment.
Keywords: recommendation system, Amazon, Taobao, customer behaviors, big data
The development of network and technology has changed how people communicate. New communication technology has brought many benefits to human beings, such as the faster speed to send and receive information, etc. However, at the same time, it can bring some trouble to people as well. One of the significant disadvantages at this time is information overload. People are surrounded by so much information, and it’s hard for them to find the most relevant one by themselves. Consequently, it’s the latest technology-recommendation system-that helps people again. Recommendation systems aim to overcome the difficulty of finding proper information (Öztürk, & Cicekli, 2011).
Recommendation system provides computer-implemented service that recommends items from a database. It can help customers to find the most relevant data in the large database at a fast speed. Recommendation systems make use of techniques such as information retrieval, customer modeling, and machine learning. The recommendations are customized to particular customers based on the data and information known about the customers. In the recommendation system, the input includes the attributes of the products and the attributes of the customers, while the output can reflect as a numeric score that is a measure of how much the algorithm “believes” that a particular customer will enjoy the recommended content or buy the recommended products.
Recommendation systems work in different domains. For example, the Netflix recommendation system aiming to suggest relative matches to its customers provides a various number of movies and TV shows. YouTube recommendation system can “guess” users’ preferences based on data and put what users might be most interested in from millions of videos on each users’ landing page. One other common application for recommendation system involves recommending products to online customers on the e-commerce platforms. E-commerce platforms believe that people do not buy things at random. Their purchases depend on a number of factors, and if the recommendation system algorithms can estimate such factors for each customer, it can make an accurate prediction about that customer’s later purchase (Jacobi, Benson, & Linden, 2011). Recommendation system can also find association rules with the form “People who buy X are also likely to buy Y”, which can analyze customer behaviors and therefore make the right products reach their potential customers.
Three Main Types of Recommendation System
Recommendation systems can be broadly divided into three main types according to the approaches they use to make recommendations, namely content-based recommendation, collaborative recommendation, and hybrid recommendation (Öztürk, & Cicekli, 2011).
Content-based recommendation system can also be called feature-based recommendation system. It analyzes the features of the content in the set and matches them to features of the customers, based on a user model developed by analyzing the previous action of the customers (Wang, Zhang, & Vassileva, 2010). Pure content-based recommendation systems suggest items based on their similarity to the content of the items which the customers selected before. For social media recommendation such as Instagram and Facebook, the content-based recommendation is quite useful because they can find similarity in the recommended content.
However, content-based recommendation systems generally do not provide any mechanism for evaluating the quality or popularity of an item (Jacobi, Benson, & Linden, 2011). In addition, one significant prerequisite of content-based recommendation systems is that the items must include some form of content that can feature extraction algorithms. As a result, content-based recommendation systems can only be limited to social media. It is not suitable for recommending products, movies, books, restaurants, and other types of items which include little or no useful content to be distinguished by algorithms.
Collaborative filtering recommendation system can also be called the social recommendation system. In the collaborative filtering recommendation system, items are recommended to customers based on the interests of a community of customers, without the analysis of items’ content (Jacobi, Benson, & Linden, 2011). Items are suggested according to the similarity between users with similar habits. Collaborative filtering recommendation system relies on customers to rate individual items from a list of popular items. Through the process, each customer builds their own personal profile of rating data. By statistically correlating users based on their previous profile and based on the assumption that people who behave similarly in the past will continue to buy similar products in the future, items that were rated highly by one group customer will be recommended to similar customers in their “community”. One important benefit of the collaborative filtering recommendation system is that it overcomes the previous deficiencies of content-based recommendation system and it is widely used to recommend movies, books, or products on e-commerce platforms.
Of course, pure collaborative filtering recommendation systems have deficiencies as well. One significant problem is that customers might feel tired to rate items in the database to build up his or her own personal profile. This process can be frustrating and time-consuming, particular in case customers are not familiar with many of the items that are presented for rating. In addition, since rating process is the prerequisite for this recommendation system, if an item is just launched on the e-commerce platforms and has not been rated by many customers yet, it might face with a “cold start” problem in which the service cannot be brought online until a threshold quantity of rating data has been collected (Jacobi, Benson, & Linden, 2011). Furthermore, since the collaborative filtering recommendation system relies on the community of “similar customers”, it is poorly suitable for providing recommendations to customers who have unusual tastes.
In addition, both content-based recommendation system and collaborative filtering recommendation have one common disadvantage. Generally, they can not reflect the latest preferences of the customers. Both the recommendation systems need time to update the data so that the recommended results can be refreshed. This might result in the inaccurate recommendation outcomes and even recommend items that customers used to purchase.
Hybrid recommendation system
Hybrid recommendation system combines the previous methods to obtain better performance. For example, Adsorption is a hybrid recommendation system that aims to select appropriate videos or movies for customers applied to YouTube successfully. In YouTube, there are millions of videos available, and customers can customize their own feeds based on big data and machine learning. First, rating information is collected according to the collaborative filtering recommendation system. And then, content-based features are injected to provide a hybrid system. Adsorption uses this rating information and tries to reach unrated videos using a graph-based algorithm.
Fig. 1. General System Architecture of YouTube Recommendation System
Retrieved from: https://link.springer.com/chapter/10.1007/978-3-642-21827-9_42
Advantages of Recommendation System Used in E-Commerce Platforms
Recommendation system can save customers’ time and energy in finding their needed information online. Processing information in the web pages and navigation on the web can take a significant amount of time for customers, requiring them to employ higher cognitive processes such as generalization and categorization (Ševce, Tvarožek, & Bieliková, 2010).
Customer experience personalization is also all about data. By applying big data and machine learning, recommendation systems can shape the overall customer experience. As Steve Jobs said, “At a lot of times, people don’t know what they want until you show it to them.” Recommendation system can provide a great way to help people “discover” items that they will like but unlikely to discover by themselves because they might easily get lost in large amounts of data. Recommendation system can improve a visitor’s experience by offering relevant items at the right time and on the right page. How well recommendation systems boost subscriber numbers through engagement and stickiness, facilitating such serendipitous discovery has turned into a high stakes multi-billion-dollar race for the world’s biggest digital companies (Arora, 2016).
A good recommendation system is able to react immediately to changes in a customer’s data and makes a compelling recommendation for all customers regardless of the number of purchases and ratings with fast speed, and the processing normally only requires sub-seconds.
Previous E-commerce Recommendation Algorithms and Their Disadvantages
The recommendation system is best known for its use on e-commerce platforms, where they input a customer’s interests and attributes to predict their potential consuming behaviors and generate a list of recommended items. The attributes of customers include not only their purchase and ratings, but also the items they viewed, the demographic information of each customer, and their subject interests, which can all be analyzed as the input data by the recommendation system algorithms.
However, e-commerce recommendation system algorithms face some challenges. First, although the data can be processed at a fast speed, customer data is volatile, and each interaction between customers and products are valuable so that it’s hard for the algorithms to respond to all the new information immediately. Second, the speed of reaction is important, and the quality is also important. Many applications require the recommendation results to be returned in real time and at a high quality. Third, for new customers, they typically have extremely limited information, based on only a few purchases of product ratings. Consequently, the recommendation results based on such little data can be not as accurate as older customers. While last but not least, for older customers, the recommendation system still faces the challenge that they have a glut of information based on thousands of purchases and ratings, and some contradictory information, for example, might also influence the outcomes of the recommendation systems.
Most previous recommendation systems applied on e-commerce platforms by looking for customers who purchased and rated similar items and grouped these customers into a community (Linden, & Smith, 2003). For each customer in the community, the algorithm aggregates the items, eliminates the items which have been bought before, and recommends the remaining items with high ratings to the customer. This belongs to the collaborative filtering recommendation system, which focuses on similar customers, rather than the content of the items.
One other kind of recommendation system algorithm, namely the content-based recommendation system, focuses on similar items rather than the similar customers instead. For each of the user’s purchased and rated items, the algorithm attempts to find similar items. It then aggregates similar items and recommends them (Linden, & Smith, 2003).
Traditional Collaborative Filtering
The goal of collaborative filtering algorithm is to suggest new items or to predict the utility of a certain item for a particular customer based on the customer’s previous liking and the opinions of other like-minded customers (Sarwar, Karypis, & Konstan, 2001). A traditional collaborative filtering algorithm can represent a customer as an N-dimensional vector of items, where N is the number of distinct catalog items. The traditional collaborative filtering algorithm can rank each item according to how many similar customers have purchased or rated it and select recommendations from similar customers’ items using various methods to measure similarity. One common method is to measure the cosine of the angle between the two vectors, using the formula as follows.
Formula to measure the similarity of two customers, A and B
However, the vector is quite sparse for very large data sets. Therefore, to generate high-quality recommendation outcomes with the traditional collaborative filtering algorithm is expansive. The high cost might be the leading cause that most e-commerce platforms will not use this algorithm to recommend items to their customers.
The cluster model algorithm divides the customer base into many segments according to some specific standards to group the most similar customers and turns the recommendation process into a classification problem. The goal of the cluster model algorithm is to assign the user to the segment containing the most similar customers.
Unlike the traditional collaborative filtering algorithm, cluster models can solve the sparse problem because it compares each user to a controlled number of segments rather than the entire customer base (Linden, & Smith, 2003). However, the recommendation quality is low at the same time because the standard of classification is relative. The cluster models might not find the right group that includes the most similar customers, and therefore the recommendations it produces are less relevant.
The search-based method is one kind of content-based method, which treats the recommendations problem as a search for related items. Given the user’s purchased and rated items, the algorithm constructs a search query to find other popular items by the same author, artist, or director, or with similar keywords or subjects (Linden, & Smith, 2003).
With the collected and created profiles, the search-based methods can alleviate the cold-start problem of collaborative filtering algorithms. However, this search-based method algorithm also has disadvantages, too. The recommendations are often either too general or too narrow. Besides, the search-based method performs better in terms of content and its significant attributes, which limits the use of such algorithms.
Case 1: Amazon and Item-to-item Collaborative Filtering
Unlike the previous recommendation system algorithms, in this paper, I will use the two special recommendation systems used in Amazon, the largest e-commerce marketplace and cloud computing platform in America, and Taobao, the biggest e-commerce website in China, as two main cases and analyze and compare the two different recommendation systems.
Amazon, as the largest e-commerce platforms in the world, is now investing a large amount of talent and resources to integrate artificial intelligence, especially “deep learning” technology to make the recommendation system more efficiently. Amazon’s recommendation system is called item-to-item collaborative filtering. Amazon uses this algorithm to integrate recommendations across the buying experience, from product discovery to checkout and personalize the online store for each customer. The effective pinpoint of predicted products to specific customers according to their profiles by recommendation systems can vastly exceed those of untargeted content such as banner advertisements and top-seller lists (Linden, & Smith, 2003).
In Amazon’s item-to-item collaborative filtering algorithm, the input data includes customers’ purchase history, items in their shopping cart, items they have rated and liked, and what other similar customers have viewed and purchased (Arora, 2016). Item-to-item collaborative filtering algorithm matches the customer’s purchased and rated to similar items and then combines those similar items into a recommendation list.
To ensure the quality of the outcome of the item-to-item collaborative filtering algorithm and provide the most similar match for each given item, the algorithm builds a similar-items table with a product-to-product matrix to find items that customers tend to purchase together. The product-to-product matrix is used to iterate through all item pairs and computing a similarity metric for each pair (Linden, & Smith, 2003). The degree of similarity between two items does not totally depend on common customers. The following iterative algorithm provides a better approach by calculating the similarity between a single product and all related products.
For each item in product catalog, I1
For each customer C who purchased I1
For each item I2 purchased by customer C
Record that a customer purchased I1 and I2
For each item I2
Compute the similarity between I1 and I2
Iterative algorithm to calculate the similarity between a single product and all related products
Given a similar-item table, the algorithm finds items similar to each of the customer’s purchases and ratings, aggregates those items, and then recommends the most popular or correlated items (Linden, & Smith, 2003).
Amazon not only used big data in the recommendation system but also opened up its sophisticated artificial intelligence technology as a cloud platform. In May 2016, Amazon unveiled its DSSTNE, which is an open source artificial intelligence framework that Amazon developed to power its own product recommendation system (Arora, 2016). The DSSTNE can be used as an open source software so that the promise of deep learning can extend beyond speech and object recognition to other areas such as research and recommendations. It can lead to better prediction based on fewer data and therefore customers are more likely to click on and buy those products recommended by the recommendation systems.
Case 2: Taobao and Tree-based Deep Model
Taobao is a Chinese online shopping website owned by Alibaba. It’s the world’s biggest e-commerce website and the seventh most visited website according to Alexa. Recommendation, search, and advertisements placement are all core tasks for providing internet content and data distribution to e-commerce businesses. The task of dealing with data in Taobao’s recommendation system is hard because the amount of data is huge. Alibaba collects data related to everything from a customer’s purchase history to the pages they view to products they bookmark. Taobao divides its customers into 500 different segments and couples those segments with the information it has on more than 1 billion products being sold on the website before putting artificial intelligence to work to generate the most accurate recommendations possible (Zhu et al, 2018).
Taobao’s special recommendation system algorithm is called Tree-based deep recommendation model, abbreviated for TDM. The architecture of TDM can be reflected in the following figure.
Fig. 2. The system architecture of Taobao display recommendation system
Retrieved from: https://dl.acm.org/citation.cfm?id=3219826
The recommendation algorithm process can be broken into two phases, Matching and Ranking. The first step is matching. First, after receiving page view request from a customer, the system uses user features, context features, and item features as input to generate a relatively smaller set of candidate items from the entire corpus in the matching server. The corpus is huge and always includes hundreds of millions of items, while the number of the filtered set of candidate items is usually hundreds (Zhu et al, 2018).
The second step is ranking. With hundreds of candidate items, the real-time server uses more expressive but also more time-consuming models to predict indicators like click-through rate or conversion rate. After ranking by strategy, several items are ultimately impressed to each customer.
The reason why Taobao adopted tree-based deep recommendation model is that it can solve the difficulty of information overload. E-commerce platforms typically have a gigantic corpus, so that the cost of using algorithms to predict each customer’s preference is huge and therefore the process of full corpus retrieval rather problematic. But with TDM algorithm, the huge data in the corpus is filtered into a smaller set at the beginning of the recommendation process, and this algorithm does not need to meet the challenge of huge amounts of data. In addition, recommended items also have to be as novel as possible. The interaction between customers and products is valuable and might be updated all the time so that even little change might influence the outcome of the recommendation system. Results that simply replicate a customer’s previous behavior are undesirable. In this respect, memory-based and item-based collaborative filtering both fall short (Zhu et al, 2018).
As a comparison, the tree-based deep recommendation model can make a huge amount of information manageable. Frustrated with the shortcomings of existing models, the Alibaba tech team decided to develop this novel tree-based deep recommendation model, which can leverage a hierarchy of information and turns the recommendation problems into a series of hierarchical classification problems (Zhu et al, 2018). With this tree-based deep recommendation model, more accurate and efficient prediction can be made from a large corpus. Like Amazon, apart from the recommendation system, Alibaba’s cloud computing subsidiary also developed an operating system called Apsara that organizes data centers into a computational engine that can process more than 175,000 transactions per second, which can also embody its efficiency with its newest high technology to better serve customers.
The recommendation system is widely used in the e-commerce platforms to put the most potential products on each customer’s landing page and help customers find the most relevant products they want. However, previous existing algorithm models cannot keep pace with new technology and therefore new algorithm models are required. Both Amazon and Taobao use their special recommendation systems, namely item-to-item collaborative filtering and tree-based deep model. These two recommendation systems have some common characteristics. Both of them are integrated with cloud computing to provide better service to their customers. Their recommendation processes both need to filter the data first to improve the efficiency, and then the ratings are ranked to make the outcomes of the recommendation system with higher quality and lead to better and more precise prediction. At the same time, these two recommendation systems have some differences. In the recommendation algorithm, Amazon uses the item-to-item matrix, while Taobao uses user-candidates matching.
- Arora, S. (2016). Recommendation engines: How Amazon and Netflix are winning the personalization battle. MarTech Advisor.
- Errico, J. H., Sezan, M. I., Borden, G. R., Feather, G. A., & Grover, M. G. (2015). Collaborative Recommendation System. U.S. Patent No. 8,949,899. Washington, DC: U.S. Patent and Trademark Office.
- Li, B., Qian, C., Li, J., Tang, K., & Yao, X. (2016, July). Search based recommender system using many-objective evolutionary algorithm. In 2016 IEEE Congress on Evolutionary Computation (CEC) (pp. 120-126). IEEE.
- Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, (1), 76-80.
- Öztürk, G., & Cicekli, N. K. (2011, June). A hybrid video recommendation system using a graph-based algorithm. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (pp. 406-415). Springer, Berlin, Heidelberg.
- Pazzani, M. J., & Billsus, D. (2007). Content-based recommendation systems. In The adaptive web (pp. 325-341). Springer, Berlin, Heidelberg.
- Sarwar, B. M., Karypis, G., Konstan, J. A., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. Www, 1, 285-295.
- Ševce, O., Tvarožek, J., & Bieliková, M. (2010, September). Term ranking and categorization for ad-hoc navigation. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications (pp. 71-80). Springer, Berlin, Heidelberg.
- Wang, Y., Zhang, J., & Vassileva, J. (2010, September). Towards effective recommendation of social data across social networking sites. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications (pp. 61-70). Springer, Berlin, Heidelberg.
- Zhu, H., Li, X., Zhang, P., Li, G., He, J., Li, H., & Gai, K. (2018, July). Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1079-1088). ACM.