EFFICIENT OPTIMIZATION FOR L-EXTSKY RECOMMENDATIONS

Original scientific paper L-extSKY recommendation has recently received a lot of attention in information retrieval community. Literature [1] proposes an algorithm EARG (Efficient Approach based on Regular Grid) to produce the L-extSKY objects in one single subspace. However, in multi-user environments, the system generally handles multiple subspace L-extSKY recommendations simultaneously. Hence, in this paper, we present an efficient algorithm AOMSR (Algorithm for Optimizing Multiple Subspace L-extSKY Recommendations) to remarkably reduce the total response time. Furthermore, we discuss two interesting variations of L-extSKY recommendation, i.e., global constraint L-extSKY recommendation and local constraint L-extSKY recommendation, which are meaningful in practice, and show how our algorithm can be applied for their efficient processing. Detailed theoretical analyses and extensive experiments that demonstrate our solution are both efficient and effective.


Introduction
The skyline recommendation and its computation have attracted much attention recently [2].To my best knowledge, various techniques have been proposed for subspace skyline recommendation.The existing approaches can be classified in three categories: (1) The first category [3,4] involves the solutions that assume that the recommendation subspace is fixed and use different index structures to improve recommendation performance.(2) Methods in the second category [5,6] consider how to efficiently process all 2 k -1 subspace skyline recommendations.(3) The third category [7,8] tackles the problem of optimizing arbitrary single subspace skyline recommendations.Clearly, if the input dataset is fixed, then the recommendation results returned by these existing approaches will keep invariable.
Literature [1] points out that in most real applications, for a ζ-dimensional dataset AD, the cardinality of its recommendation result does not exceed (ln ζ-1 |AD|)/ (|AD|⋅(ζ-1)!) of that of AD [9].So, the recommendation result returned by the existing approaches cannot efficiently assist the users to explore the whole dataset.Motivated by the above fact, the literature [1] extends the semantics of skyline recommendation and proposes a new type of recommendation which is called L-extSKY recommendation.Given a set of ζ-dimensional objects, a L-extSKY recommendation on the subspace V (|V|≤ζ) finds the objects that are dominated by at most L objects on V. Conceptually, K represents the thickness of the skylines; the special case K =0 corresponds to the conventional skyline recommendation.It is easy to see that compared with the traditional skyline recommendation, the L-extSKY recommendation has at least 2 advantages: (1) the L-extSKY recommendation can provide more opportunities for users to explore the whole input dataset, since it also considers the non-skyline objects; and (2) the users can flexibly adjust the parameter L to obtain the recommendation result which they need.Consequently, the L-extSKY recommendation is more meaningful in practice.Furthermore, the literature [1] presents an algorithm EARG (Efficient Approach based on Regular Grid) to produce the L-extSKY objects in arbitrary single subspace.The EARG approach utilizes the regular grid structure and prunes all the cells which are dominated by any other ones, and hence it can evidently reduce the number of comparisons between objects.However, in multi-user environments, the system generally handles multiple subspace L-extSKY recommendations simultaneously.Hence, in this paper, we propose an efficient algorithm AOMSR (Algorithm for Optimizing Multiple Subspace L-extSKY Recommendations) to markedly reduce the total recommendation time.The AOMSR algorithm first organizes all issued subspace L-extSKY recommendations as a recommendation tree, and then uses the share mechanism in tree paths to improve the total performance of these L-extSKY recommendations.Moreover, we discuss two interesting variations of L-extSKY recommendation, i.e., global constraint L-extSKY recommendation and local constraint L-extSKY recommendation, which are meaningful in practice, and show how our algorithm can be applied for their efficient processing.Detailed theoretical analyses and extensive experiments that demonstrate our solution are both efficient and effective.

The EARG approach
In this section, we briefly review the EARG approach proposed in the literature [1].
The EARG approach uses the regular grid [10] to index the objects.Consider a set AD of objects which totally has ζ dimensions.And each dimension d over AD has a set of disjoint ranges which partition the value domain of d.Let the extent of each cell on the dimension d i (1≤i≤ζ) be δ i .Then the cell c[a 1 ,…, a ζ ] contains all objects with the i-th dimension in the range [a i ⋅(δ i -1), a i ⋅δ i ).Conversely, give an object p with attributes (p.x 1 ,…, p.x ζ ), its covering cell can be determined (in constant time Moreover, the EARG approach uses the following four definitions and one theorem, which provide us an opportunity to optimize the performance of the L-extSKY recommendation.
Definition 1. Assume that the full space F consists of In order to efficiently realize the EARG approach, the literature [1] distinguishes three possibilities about the relationship between any two For simplicity, literature [1] denotes this relationship as satisfy the following condition, then we say "C 1 (V) partially dominates C 2 (V)": ∃U⊂V, ∀i∈ [1, |U|], a i =b i , and ∀j∈[|U|, v], a i <b i .
For simplicity, literature [1] denotes this relationship as satisfy the following condition, then we say "C 1 (V) is incomparable with C 2 (V)": ∃i∈ [1, v], a i <b i , and ∃j∈ [1, v], b j <a j .
For simplicity, literature [1] denotes this relationship as cover the objects sets S 1 (V) and S 2 (V), respectively.Then we can have: Based on the definitions 1-4 and the theorem 1, the EARG approach can be efficiently implemented in the algorithms 1 and 2. Algorithm 1: EARG Input: the set of ζ-dimensional objects AD, each object is associated with one counter count whose initial value equals 0; the subspace V and its dimensionality v; the regular grid index Ξ(AD, F); the parameter L.
For orderly visit each cell For ∀p∈CR(C α (V)) Do 11.
If p.count ≥L Then flag←True; , which has at least two advantages: (1) If a cell C α (V) is visited earlier, then the probability that it is fully or partially dominated by other cells is lower.
, and hence according to Theorem 1, we can have that the objects in S β (V) cannot dominate any object in S α (V) on V. Consequently, this condition can remarkably reduce the number of comparisons between objects.For each visited cell C α (V), the step 6 uses the function OBCANS which is described in Algorithm 2 to obtain all candidate L-extSKY objects in S α (V); that is, if p∈CR(C α (V)), then p is dominated by at most K objects in S α (V) on V.In the steps 8÷19, for each object p∈ CR(C α (V)), the algorithm modifies its counter.And if p.count exceeds L then p can be safely removed from CR(C α (V).Specially, the steps 9÷13 focus on considering each cell C λ (V) which fully dominates C α (V), while the steps 14÷19 focus on considering each cell C u (V) which partially dominates C α (V).In the steps 9÷13, if C λ (V) fully dominates C α (V), then according to Definition 2, for each object p∈CR(C α (V)), p does not need to be compared with any objects in C λ (V), and p.count is directly added |S λ (V)|.While in the steps 14÷19, if C u (V) partially dominates C α (V), then according to Definition 3, for each object p∈CR(C α (V)), p needs to be compared with all objects in S u (V), and p.count is added the number of objects in S u (V) which dominate p on V. Furthermore, in the step 20, if the algorithm finds that there exists some object p∈S α (V) whose counter is greater than or equal to L, then we can safely remove all cells from seqV which are fully dominated by C α (V).
The function OBCANS utilizes the thought of presorting objects in [4] to obtain CR(C α (V)), which can be shown in the following algorithm.Algorithm 2: OBCANS Input: the set S α (V) that consists of all objects inside C α (V); the subspace V; the parameter L.
In the steps 2÷3 of Algorithm 2, we sort the objects in S α (V) using the sorting-operator , which has at least three advantages: (1) If an object p in L α (V) is visited earlier, then the probability that it is dominated by other objects in is lower.(3) It ensures that any object in L α (V) cannot be dominated by other objects located after it on V, and hence can dramatically reduce the number of comparisons between objects.For each object p∈L α (V), in the steps 7÷12, the algorithm modifies p.count.And if p.count does not exceed L, then the algorithm adds p into CR(C α (V)).Furthermore, in the algorithm, we use the Boolean variant flag 1 as a cell-pruning indicator.When flag 1 is True, we can know that there exists some object in CR(C α (V)) whose counter is greater than or equal to L. And hence we can safely remove all cells from seqV which are fully dominated by C α (V).

Optimizing multiple subspace L-extSKY recommendations simultaneously
This section focuses on optimizing multiple subspace L-extSKY recommendations simultaneously in multi-user environments.Let u be the number of subspace L-extSKY recommendations handled in multi-user environments.A naïve solution is to run the EARG approach [1] on the original datasets u times.Obviously, the naïve solution becomes inefficient as the cardinality of original datasets increases, which can be seen in our experimental evaluation.Motivated by this fact, we propose an efficient algorithm AOMSR to reduce the total recommendation time.The AOMSR algorithm first organizes these u L-extSKY recommendations as a subspace recommendation tree, and then exploits the share mechanism in tree paths to reduce the total recommendation time.For easily understanding, in the following parts, we use p.count V to denote the counter of p on V.
Definition 5. Assume that the list SQB contains u subspaces <V 0 ,…, V u >.Then if SQB satisfies the following two properties, then we call it a consistent subspace sequence: (1) Definition 6.The subspace recommendation tree T sb = (ND, ES) is built over a consistent subspace sequence SQB=<V 0 ,…, V u > and needs to satisfy the following properties.
(c) ¬∃V h ∈ND, such that <V i , V j >∈ES and <V i , V j > satisfies Property 2 (a) and (b), and h<i.

Definition 7.
Let AD be the set of ζ-dimensional objects, and rt be the root node of the subspace recommendation tree T sb .And for each non-root node V, we assume that PV is its parent node.Then we can recursively define the seed subspace L-extSKY set for each node in T sb as follows: jects, and rt be the root node of the subspace recommendation tree T sb .And for each non-root node V, we assume that PV is its parent node.Then based on the seed subspace L-extSKY set, we can define the shadow subspace L-extSKY set for each node in T sb as follows: Theorem 2. Let AD be the set of ζ-dimensional objects, and rt be the root node of the subspace recommendation tree T sb .And for each non-root node V, we assume that PV is its parent node.Then for each node V in T sb , we can have:

EXSET(seed(V)∪rep(V), V). This contradicts the above assumption that p∈L-EXSET(seed(V)∪rep(V), V). So then, L-EXSET(seed(V)∪rep(V), V)⊆L-EXSET(AD, V). based on the above analyses, we can know that the theorem holds.
The AOMSR algorithm is based on Theorem 2 and can be shown below.
SQB←the consistent subspace sequence containing u subspaces; 2. T sb ←the subspace recommendation tree over SQB; 3.For each node V in T sb Do 4.
If V is the root node rt Then 5.
PV←the parent node of V; 8.
Divide seed(V) into m subsets SD For each subset SD i in LD Do 16.
Select some object p from SD i ; 17.
For each object q∈SB i Do q.count V ←p.count V ; 19.
For each SD t which locates after SD i in LD Do 20.
Select some object r from SD t ; 21.
If r dominates p Then 22.
If r.count V +|SB i |>L Then 23.
Else For each object δ∈SD t Do 26.

End
In Algorithm 3, after obtaining the subspace recommendation tree T sb (Steps 1÷2), the AOMSR algorithm visits each node V and obtains its subspace L-extSKY set L-EXSET(AD, V) in a breadth-first mode (steps 3÷28).In the steps 4 and 5, for the root node rt, since its seed subspace L-extSKY set is directly obtained from the original dataset AD using the EARG approach, seed(rt)=L-EXSET(AD, rt).While for each non-root node V, the AOMSR algorithm needs to utilize the property of Theorem 3 to obtain L-EXSET(AD, V).The basic idea is that the algorithm first obtains the seed subspace L-extSKY set seed(V) in step 8, and then uses seed(V) to produce the correct subspace L-extSKY set L-EXSET(AD, V) in the succeeding steps.In the step 10, the algorithm divides seed(V) into m subsets and all the objects in the same subsets share the same values on V.And in the steps 11÷13, for each subset SD i , the algorithm obtains its key SD i .key.Then the step 14 organizes these m subsets as a list LD=<SD 1 ,…, SD m > satisfying: i<t⇒SD i .key≤SDt .key.Note that for any two subsets SD i and SD t in LD, if SD i locates before SD t , then the objects in SD t cannot dominate the objects in SD i on V.After obtaining the list LD, the algorithm orderly processes each subset SD i .In the steps 16÷17, the algorithm first randomly selects an object p from SD i and obtains the set SB i which consists of the objects that belong to AD-∆(PV) and fall inside the covering cell of p and share the same values on V with p. Then in the step 18, for each object q∈SB i , the algorithm set q.count V to be equal to p.count V .Since the objects in SB i are added into the shadow subspace L-extSKY set rep(V) and ultimately are incorporated into L-EXSET(AD, V) (steps 27÷28), the algorithm needs to handle each subset SD t which locates after SD i in LD (steps 19÷26).The handling process can be described as follows.The algorithm first randomly selects an object r from SD t and checks if r is dominated by p on V.If yes, then the algorithm continues to evaluate the value of r.count V +|SB i |.If r.count V +|SB i | is greater than K, then the algorithm deletes all the objects in SD t from seed(V) and removes SD t from LD; otherwise, for each object δ in SD t , δ.count V is added |SB i |.Technical Gazette 22, 5(2015), 1099-1106 nates r)⇒λ(r)⇔ ¬(λ(p)∧(p dominates r))∨λ(r); ∃ p, r∈AD, λ(p)∧(p dominates r)∧~λ(r).Hence, ∃ p, p∈ L-EXSET(λ(AD), V) (0) ⇒∃ p, p∉λ(L-EXSET(AD, V) (0) ).This contradicts with the above assumption that λ(L-EXSET(AD, V) (0) )=L-EXSET(λ(AD), V) (0) .Hence, the theorem stands.

Experimental evaluation
This section conducts an empirical study of our methods using the benchmark synthetic datasets.We evaluate the efficiency and the scalability of the proposed methods.Using the data generator [4], we generate two types of synthetic datasets as described in [4]: (a) Independent datasets where the dimension values of the generated objects are uniformly distributed; (b) Anticorrelated datasets where if an object is good in one dimension, it is unlikely to be good in other dimensions.And each object totally has eight dimensions whose data types are 4-type float.Furthermore, in the following experiments, we fix the number of ranges over each dimension to 5. All our experiments are implemented in Java, running on a PC with i5-3210M 2,50 GHz processor and 2 G main memory.

Evaluating multiple subspace L-EXTSKY recommendations
In this subsection, we focus on evaluating the efficiency of our AOMSR algorithm.It is important to note that in order to show the superiority of the AOMSR algorithm, we also evaluate the naïve solution (i.e., running the EARG approach u times) in the experiments where u is the number of subspaces.We implement this set of experiments in the following experimental setting: From Fig. 2 and Fig. 3, we can make the following observations: (1) The AOMSR algorithm evidently outperforms the EARG solution in all cases.This is mainly because the AOMSR algorithm organizes all needed subspace L-extSKY recommendations as a subspace recommendation tree, and then utilizes the share mechanism in tree paths to reduce the total recommendation time.While the EARG solution simply runs the EARG approach u times where u is the number of subspaces.For example, in Fig. 2b, when the number of subspace is equal to 60, the recommendation time of the EARG solution exceeds 1951,3 seconds, while the AOMSR algorithm only needs 800,9 seconds.That is, in this case, the recommendation time of the AOMSR algorithm is only about 41,04 % of that of the EARG solution.(2) The superiority of the AOMSR algorithm over the EARG solution becomes more marked as the number of subspaces increases.This is mainly because when the number of subspaces increases, the number of L-extSKY recommendations which need not to be answered from the original datasets will increase, and hence the sharing effect of the AOMSR algorithm is more evident.For example, in Fig. 3b, when the number of subspaces is equal to 20, the recommendation times of the AOMSR algorithm and the EARG solution are about 469,60 seconds and 2323,29 seconds, respectively.That is, in this case, the AOMSR algorithm only reduces the recommendation time by as much as 1853,69 seconds.While when the number of subspaces is equal to 60, the recommendation times of the AOMSR algorithm and the EARG solu-tion are about 5455,31 seconds and 2262,70 seconds, respectively.That is, in this case, the AOMSR algorithm can reduce the recommendation time by as much as 3192,61 seconds.
(3) The superiority of the AOMSR algorithm over the EARG solution is more marked for anti-correlated datasets than for independent ones.The main reason is that the EARG solution directly runs the EARG approach on original datasets and requires more recommendation time for anti-correlated datasets.For example, in Fig. 2a, when the number of subspace is equal to 50, the recommendation times of the AOMSR algorithm and the EARG solution are about 5,87 seconds and 12,59 seconds, respectively.That is, in this case, the AOMSR algorithm only reduces the recommendation time by as much as 6,72 seconds.While in Fig. 3a, when the number of subspace is equal to 50, the recommendation times of the AOMSR algorithm and the EARG solution are about 1444,80 seconds and 4672,08 seconds, respectively.That is, in this case, the AOMSR algorithm can reduce the recommendation time by as much as 3227,28 seconds.

Evaluating two variations of the subspace L-extSKY recommendation
In this subsection, we focus on evaluating the efficiency of processing our two variations of subspace L-extSKY recommendation (i.e., global constraint L-extSKY recommendation and local constraint L-extSKY recommendation).The compared approaches are G_EARG (L_EARG) proposed in Section 4, and G_SUBSKY (L_SUBSKY) proposed in [7].
We implement this set of experiments in the following experimental setting:  From Fig. 4 and Fig. 5, we can observe that the G_EARG and L_EARG respectively outperform the G_SUBSKY and L_SUBSKY in all cases.This is mainly because the EARG approach evidently outperforms the SUBSKY approach [7] in all cases.For example, in Fig. 4b, when the dimensionality of subspace is equal to 7, the recommendation time of the G_SUBSKY algorithm exceeds 1697,46 seconds, while the G_EARG algorithm only needs 298,17 seconds.That is, in this case, the recommendation time of the G_EARG algorithm is only about 17,57 % of that of the G_SUBSKY algorithm.In multi-user environments, the systems generally handle multiple subspace L-extSKY recommendations simultaneously.Hence, in this paper, we propose an efficient algorithm AOMSR to evidently reduce the total response time.The AOMSR algorithm first organizes all needed subspace L-extSKY recommendations as a subspace recommendation tree, and then employs the share mechanism in tree paths to enhance the total performance.Moreover, we discuss two interesting variations of subspaceL-extSKY recommendation which are meaningful in practice, and show how our algorithm can be applied for their efficient processing.We also present the detailed predication condition which can cause the equivalence of these two variations.The detailed theoretical analyses and extensive experiments demonstrate that our proposed solution is both efficient and effective.
Future work will focus on using some more efficient index structures to improve the performance of our algorithm, extending our algorithms to stream and P2P environments, and on more experimentation.
the parameter L; the set of ζ-dimensional objects AD; the regular grid index Ξ(AD, F); u L-extSKY recommendations whose corresponding subspaces are (1) the cardinality Card of input datasets varies in the range [1×10 5 , 7×10 5 ]; (2) the parameter K is fixed to 3; (3) the dimensionality ζ of full space is fixed to 8; (4) the number of subspaces varies in the range [20, 60] (shown in Fig. 1, Fig. 2 and Fig. 3 show the results of experiments for independent datasets and anti-correlated datasets, respectively.

Figure 1 Figure 2 Figure 3
Figure 1The relationship between the number of subspaces and the dimensionality of subspace (1) the cardinality Card of every dataset varies in the range [1×10 5 , 7×10 5 ]; (2) the parameter K is fixed to 3; (3) the dimensionality ζ of full space is fixed to 8; (4) the dimensionality v of subspace varies in the range [3, 7]; (5) for each dimension z, the constraints λ select the region [min z +(max z −min z )/5, min z +(max z −min z )×4/5] where min z and max z are the minimum and the maximum values of objects in every dataset on z, respectively.Fig. 4 and Fig. 5 show the results of experiments for independent datasets and anti-correlated datasets, respectively.

Figure 4
Figure 4 Independent datasets

Figure 5 Anti-correlated datasets 6
Figure 5 Anti-correlated datasets 6 Conclusions and future work ←the object list obtained by sorting all the objects in S α (V) in the key ascending order; 4. flag 1 ←False; 5.For orderly visit each object p∈L α (V) Do 6.