In this section we describe a new method of obtaining the neighbors of a test customer using a graph. It is based on the fact that the neighbors who have not only high similarity, but also low similarity with respect to the test customer may have influence considerably on making prediction. Using a graph approach allow us to overcome the fact that the Pearson correlation coefficient are calculated between only a pair of customers. For example, if we assume the similarity between customers A and B is high and the similarity between customers B and C is high, then the similarity between customers A and C is also considered as high. The same holds for low. That is, we can achieve valuable information for prediction through the transitivity of similarities.
In this graph approach, we treat the dataset as a weighted undirected graph. A vertex in the graph represents each customer and a weighted edge corresponds to the similarity between two endpoints(customers) of the edge. First, we search for a vertex v with the highest similarity with respect to a given test customer u. We then search the neighboring vertices who has the similarities either larger than H or smaller than L, where H and L are some threshold values that can be determined with various experiments. The search is performed in the breath-first manner. That is, we search the neighbors of each vertex w in the neighbors of v according to H and L. The search stops when we have enough neighbors for prediction. If a vertex has already chosen as a neighbor, then the vertex is not considered during searching.
Figure1 shows how to select the neighbors of u when depth_count = 2. A solid line indicates that the weight on the edge is larger than H. A dotted line means that the weight of the edge is less than L.
Fig. 1. An example of searching the neighbors with the graph approach when depth-count = 2
We now describe the proposed algorithm. In the algorithm, before we apply the graph approach to the dataset, we want to remove unnecessary customers with the K-Means Clustering method. After we obtain k clusters with the K-Means Clustering method, we choose one cluster that holds the customers with higher similarities with respect to the test customer u. We then apply the graph approach to the “best” cluster. The following describes the overall algorithm formally.