APPROXIMATE FILTERING OF REDUNDANT RFID DATA STREAMS IN MOBILE ENVIRONMENT

Original scientific paper Recently, RFID technology has been widely used in many applications such as object monitoring and tracing due to the unique features such as noncontact, automatic, fast and multi-target identification simultaneously. However, because of the interference of environmental factors and the requirement of real-time detection, the data collected by the RFID readers are often full of redundancy, which may reduce the processing efficiency of RFID application servers, even lead to making false decisions. Therefore, it is of definite necessity to filter the redundant data in RFID systems before transmitting them to the upper applications. In order to support approximate filtering of RFID data streams in mobile environment, this paper intends to study effective redundant filtering mechanism in the sliding window model. Firstly, we introduce the application background of RFID data streams and the RFID system architecture based on middleware. Then, we propose a temporal-spatial Bloom filter based on sliding windows, which extends the onedimension array in the standard bloom filter to a two-dimension array, storing both reader IDs and the observed timestamps of original observation items. Meanwhile, in order to guarantee the false positive rate does not increase due to the reason that the space of the filter becomes full, we suggest a random decay strategy for deleting the expired elements. The error rates of the suggested filter, including false positives and false negatives, are analysed in theory. Experimental results show that the suggested filter can filter time redundant data effectively and has a good performance to deal with location movement of RFID objects.


Introduction
Radio frequency identification (RFID) technologies allow readers, from a distance without line of sight, to identify the objects associated with unique identifier codes automatically.Because of the unique features such as non-contact, automatic, fast and multi-target identification simultaneously, the technologies are very helpful to monitor, track and trace a large amount of objects in a cost effective way, and have been applied in many domains in recent years, such as monitoring the tagged objects in supermarkets and libraries, tracing drug in supply chains and tracking airline luggage, etc.
However, the adoption of RFID can also bring new issues to the RFID-based applications.For instance, basically RFID readers usually keep constantly sending data about the tagged objects within their detection range, to the application servers according to specified detection cycles, as a result of which plenty of redundant data will be generated.The redundant data, if not filtered, may on the one hand reduce processing efficiency of the application servers, and on the other hand lead to making false decisions.For example, when we check the inventory of a supermarket using RFID technologies, if without filtering the redundant data, the statistics of the items might be wrong since some items may be counted more than once.Likewise, when a car with an RFID tag passes through a highway toll station, if without filtering the redundant data, the car will be deducted more than once.Therefore, it is of definite necessity to filter the redundant data in RFID systems before transmitting them to the upper applications.
The basic idea of traditional strategies in terms of data filtering is sorting and merging, that is, sorting the data stored in the databases first, then detecting the duplicate records by way of comparing adjacent records.This kind of methods include the basic string matching method [1], the recursive matching method [1], the Smith-Waterman algorithm [2], the edition distance method [3], the Cosine similarity function [4], the priority queue algorithm [5], the sorted-neighbourhood method [6], and so on.Whereas, all the methods require the data should be stored in the databases in advance before filtering.Thus, they become invalid for filtering the redundant data in real-time data stream environment.
Currently, hash-based Bloom filters are mainly adopted for approximate filtering of data streams.There are some algorithms suggested, such as the click-stream copy filter [7], the stable Bloom filter [8], the spectral Bloom filter [9], the dynamic count Filters [10], the decaying Bloom filter [11], and so forth.Nonetheless, all these algorithms are only applicable to the environment of the traditional data streams and do not consider the characteristics of RFID data streams.
As for the case of filtering the data in RFID streams, a Redundant Reader Elimination (RRE) algorithm is proposed in [12].In the algorithm, the readers detect the tagged objects within their ranges and write on the total detection number and the reader's ID for each object.Considering RRE algorithm may fail in some cases, Hsu at al. [13] proposed an algorithm of Layered Elimination Optimization (LEO), which can assure that the first detection can be written on the tags, and can reduce the times of write.Since the detection order about the tags in LEO algorithm is random, its reliability is relatively low.Both of the two algorithms mentioned above advocate that data filtering should be done at the reader side, which needs expensive rewritable tags and the cost is high, hence neither is applied in the RFID system commonly.Mahdin and Abawajy [14] use a counting Bloom filter to remove the redundant RFID data.It also considers that redundancy elimination will be processed at the reader side, so the algorithm will fail if the RFID readers have no independent computing capability.Moreover, the algorithm may produce both false-positive and falsenegative errors.
In recent years, filtering redundant data in RFID middleware has obtained more and more attention.RFID Cube [15] is designed to remove redundant data based on data warehouse model.It is designated on the assumption that the set of tagged objects must be moved simultaneously, which therefore is not applicable to the case of a single object moving freely.Moreover, the method is just fit for the data stored in the data warehouse for decayed filtering, so that it cannot be applied to realtime filtering of the RFID data streams.Lee and Chung [16] proposed two kinds of time Bloom filters specifically for RFID data, the Time Bloom Filter (TBF) and its modified version -the Time Interval Bloom Filter (TIBF).Both filters store the detection time or arrival intervals of detected tags in the cells of the filter and then update their values so as to ensure the error rates will not increase because of the "full" of the Bloom filters.However, both are merely applicable to the case of static tags and in the application background where multiple readers are deployed in the same physical region for high reliable detection.Moreover, they do not consider the concept of data stream windows, so they cannot be used for filtering RFID data streams and mobile RFID objects effectively.Mahdin and Abawajy [17] proposed a Comparison Bloom Filter (CBF) based on the landmark window model, which stores the detection times in the filter cells and identifies the reader that the object belongs to by way of comparing the times.However, the algorithm is prone to leading to accidental deletion of valid data and cannot be used for the sliding window model.
In order to support approximate filtering of RFID data streams in mobile environment, this paper intends to study effective redundant filtering mechanism based on the sliding window model.The rest of the paper is arranged as follows.We introduce RFID application background and system architecture in Section 2. In Section 3, we discuss the principle and algorithm of the suggested temporal-spatial Bloom filter.We analyse the error rates of the suggested filter in theory in section 4. In Section 5, we verify and evaluate the performance of the proposed filter in an RFID-based supermarket.We conclude the paper in the end.

Background and system architecture 2.1 Background
The data generated by the RFID readers is typically stream data.In some applications, the locations of the tagged objects may be changed with the change of time.For example, in a smart supermarket, a tagged object may be taken from one shelf to another.Therefore, when filtering RFID data streams, it not only considers the observation time but also the observation locations.
As shown in Fig. 1, there are three readers R 1 , R 2 and R 3 deployed on appointed shelves S 1 , S 2 and S 3 , respectively, which detect the tagged objects at a regular detection cycle.It is assumed the detection ranges of different readers do not overlap.Each detection record generated by the readers includes a tag ID, a reader ID, and an observation time.T 1 , T 2 and T 3 represent 3 tagged objects.From time 1 to 10, T 2 and T 3 always stay on shelves S 2 and S 3 , while the location of T 1 has been changed, namely S 1 (time 2)→S 2 (time 8)→S 3 (time 10).It is assumed in the scenario of Fig. 1, the readers only send one observation record of each tagged object to the application servers in each time window and all the left are considered as the redundant data.It can be seen that from the window between 0 and 10, T 1 is detected 3 times by R 1 , 2 of which are redundant.Similarly, since and T 3 are detected 5 times respectively by R 2 and R 3 , 4 of which are redundant.However, T 1 's location is changed twice in this window, i.e., it has also been detected once by R 2 and R 3 , which cannot be considered as the redundant data.Hence, there should be 5 valid records in Fig. 1, namely <T 1 , R 1 , 2>, <T 2 , R 2 , 2>, <T 3 , R 3 , 2>, <T 1 , R 2 , 8> and <T 1 , R 3 , 10>，of which, there are 3 records related to T 1 .
Therefore, in the scenario like the smart supermarkets, identifying the redundant data cannot only rely on the detection time but also on the location information, that is, the filtering algorithm should take both time and location information into consideration.

Redundancy definition
The sliding window model for data streams can be categorized into count-based sliding windows and timebased sliding windows.The former saves the number of the latest arriving k records (k means the size of windows), while the latter saves the data records arriving most recently in the window, which requires the arriving time of the records should be within the range of current time window <t-w-1, t>, where t is the present time and w is the size of the window.This paper will discuss redundant data filtering algorithm based on the timebased sliding window model.
Definition 1.An RFID data stream DS is composed of n elements S 1 , S 2 , …, S n , among which S i is a detection triplet <tid, rid, ts>, wherein tid represents the unique identification of the tag, rid represents the identification of the detection reader and ts represents the observation timestamp.
As stated in [18], the redundancy problem is recognized as a serious issue in RFID and sensor networks.Redundancy can happen at two different levels, i.e., reader level and data level.
-Redundancy at reader level occurred when there are more than one reader deployed to cover a specific location.-Redundancy at data level occurred as data streams.
The RFID data can be captured very fast several times in RFID data streams, it is necessary to identify and eliminate those data before their use.
In this paper, we only consider the redundant data in the latter case, i.e., in data stream environment.
Definition 2 (Data redundancy).Supposing W is a time window, DS is an RFID data stream.If there are two data x∈DS, y∈DS in W, and y.tid=x.tid,y.rid=x.ridand y.ts<x.ts,then x is regarded as a redundancy data.
According to Definition 2, if the location of an object is changed in a time window, i.e., the object is detected by different logical readers in two adjacent records, the latter will not be identified as a redundant data.Hence, the definition is applicable to filter the RFID objects in mobile environment.

System architecture
RFID belongs to a kind of automatic identification technologies, which automatically identify objects, collect data about them, and enter those data directly into computer systems with no human intervention.
RFID technologies utilize radio waves to accomplish the function.In general, a basic RFID system consists of three components: RFID tags, RFID readers, and antennas.The RFID systems work like this: the RFID tags contain an integrated circuit and an antenna, which are used to transmit data to the RFID readers; the readers then convert the radio waves to a more usable form of data.Information collected from the tags is then transferred through a communications interface to a host computer system, where the data can be stored in a stream or a database and provide useful information for the predefined applications.
Fig. 2 is the RFID-based application system architecture, which includes three levels: -The RFID hardware level.This level includes RFID tags and RFID readers.The RFID readers receive and transmit data from the RFID tags using the connected antennas through radio waves.The functions of the RFID readers include powering the tags, reading data from or writing data to the tags.-The RFID middleware level.The RFID middleware manages the readers, as well as filters and formats the RFID raw tag data, so that they can be accessed by the various interested enterprise applications.Hence, the middleware is a key component for managing the flow of information between tag readers and enterprise applications.-The RFID application level.Based on the middleware, the RFID system provides various services for end users through this level.Therefore, before an observation RFID data is sent to the application level, the system will judge whether the data is a redundant data in the RFID middleware.In the structure, the principle of filtering redundant data in RFID streams can be described as follows.1) When an RFID reader detects an RFID tag, it generates a data item x, and sends it to the RFID middleware.
2) The middleware first extracts the tid, rid and ts in x.
3) Then, the middleware determines whether there is a previous item y in the stream which satisfies the condition: y.tid=x.tid,y.rid=x.ridand y.ts<x.ts.4) If the condition is held, x is determined as a redundant data, otherwise, x is not a redundant data.A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.For it is of no necessity for the Bloom filter to store object identifications, it has been widely applied to the data stream filtering with the advantages such as less query time, less storage space and independence of data amount.This section first gives a brief introduction of the standard Bloom filter, and then designs a temporal-spatial Bloom Filter (TSBF) for filtering redundancy data in RFID data streams.

Preliminary: the Standard Bloom filter
A standard Bloom filter is traditionally implemented by a single array of M bits, where M is the filter size.On filter creation all bits are reset to zeroes.A filter is also parameterized by a constant k that defines the number of hash functions used to activate and test bits on the filter.Each hash function should output one index in M. When inserting an element on the filter, the bits in the k indexes h 1 (e), h 2 (e), . . ., h k (e) are set.
In order to query a Bloom filter, say for element x, it suffices to verify if all bits in indexes h 1 (x), h 2 (x), … , h k (x) are set.If one or more of these bits is not set, then the queried element is definitely not present on the filter.Otherwise, if all these bits are set, then the element is considered to be on the filter.Given this procedure, an error probability exists for positive matches, since the tested indexes might have been set by the insertion of other elements.

The design of TSBF
In order to filter the redundant data defined in Definition 2 effectively, we extend the single array in the standard bloom filter to a two-dimensional array.
Fig. 3 presents the structure of TSBF, from which it can be seen that there are in total m storage cells (0 ÷ m−1) and each one is a two-dimensional array, where one dimension stores reader IDs and the other stores the observation timestamp of the detection record.Hence, the i th cell in the filter can be represented as M i [rid][ts].

Figure 3 TSBF Structure
First of all, the filter will be initialized and the dimensions both of reader IDs and time of all the cells are set as zero.When a new data arrives, whether it is a redundant data or not is determined by the reader ID and timestamp information stored in the k cells obtained by adopting k hash functions h 1 , h 2 , …, h k independently, using tid as the index key.
Let w be the size of the sliding windows, the identifying process of the redundant data is as the following: 1) As a data record x enters into the filter, it will be mapped to k storage cells in TSBF by k hash functions, h 1 , h 2 , …, h k , according to x.tid.2) If there is any i∈{1, 2, …, k} and M i [ts] = 0, it means that the object with x.tid is a new arriving object in current window, and it is identified as a nonredundant data.In the meantime, the timestamp and location information of all the k cells are updated, namely, M i [ts] = x.ts and M i [rid] = x.rid.3) If all the time dimensions of the k cells are not equal to 0, but there is any i∈{1, …, k}, x.rid ≠ M i [rid], it implies the location of the object has been changed, so it is also identified as a non-redundant data.Meanwhile, the timestamp and location information of all the k cells are updated.4) When the rid of all the k cells are equal to M i [rid], if there is any i∈{1, 2, …, k}, x.ts − M i [ts] > w, it reveals that the object is the newly arriving one in the present window, so it is also identified as a nonredundant data.Afterwards, the time and location information of all the k cells are updated.5) Otherwise, the item is identified as a redundant data, which will be abandoned after the time and location information of all the k cells are updated.

Deletion of expired data
In the sliding time window, the data in the streams will be expired with the change of time, but the information of these expired data still occupies the storage cells if they are not deleted in time.In this way, the storage cells of the filter will become "full" gradually with the change of time, which will lead to reducing accuracy ratios.Hence, the "expired" data should be removed in time from the filter.Whereas, in order to save storage space, all the Bloom filters do not save the values of the index keys, which means we cannot find out in which cells the data is expired.
According to the characteristics of the RFID data streams, if a tagged object is constantly detected by the same reader, its time values saved in the corresponding cells in the filter will be relatively large.On the contrary, if an object is not detected by a reader for a long time, the time values of the cells related with the object keeps the early one.
Therefore, we adopt a random decay strategy for the deletion of expired data.That is, once finishing redundancy identification of a newly arrived record, we select randomly P cells in the filter to minus their time values by 1.The reason is, if the time value of a cell has not been updated for a long time, it will finally become 0 after multiple minus.This procedure equals to the process of deleting the object related to this cell in the filter, so as to avoid the increase of error rates since the filter becomes "full".

The filtering algorithm of TSBF
Consider a set A = {a 1 , a 2 , …, a n } of n elements.Bloom filters describe membership information of A using a bit vector V of length m.For this, k hash functions, h 1 , h 2 , …, h k with h i : X→{1…m}.Bloom filters can be built incrementally: as new elements are added to a set the corresponding positions are computed through the hash functions and bits are set in the filter.The details of the TSBS are shown in Algorithm 1.
For each new arrived data, k hash functions are applied based on tid (Line 3÷4), to get k cells.For each cell, it needs to be identified whether its time value is 0. If there is a cell whose time value is 0, the filter should be updated and the data is identified as a non-redundant data (Line 6÷9).Or else, the reader IDs stored in the cells are the same as the reader ID of the newly data.If different, the filter should also be updated and the data is also identified as a non-redundant data (Line 11÷13).Or else, it will judge whether the value that is generated by cutting the time value of the cell from the time value of the data is higher than the size of the window w.If higher, the filter is updated and the data should also be identified as a non-redundant data (Line 15÷17).If none of the above conditions holds, the data is identified as a redundant data and is abandoned (Line 18÷19).Having finished the process of the new arrived data, P cells are selected randomly.If the time values of the selected cell is no less than 1, cut them by 1 (Line 21÷24).If the time value becomes 0, the RID of the cell is reset as 0 (Line 25÷26).

The example of TSBF
Fig. 4 shows an example of filtering RFID data stream, in which there are some observation data of T 1 and T 2 on two shelves.
The data stream includes 6 records arriving in sequence : <T  Assumed the data is mapped to the cells of 0, 2 and 5 by using the hash functions.For the data is a newly arrived data, the domains both of the reader and the time of the three cells are updated, rid = 1, ts = 2, which is shown in Fig. 4a; 2) When <T 1 , R 1 , 4> arrives, its tid is the same as that of the former record, so they are mapped to the same locations in the filter.Moreover, since the reader IDs of the three cells are the same and the time distance is 4−2 < 10, this data is identified as a redundant one with no need of updating the filter as is shown in Fig. 4b; 3) When <T 2 , R 2 , 4> arrives，it is mapped to the cells of 1, 4 and 6.The timestamps of all the three cells are 0, so it is identified as the newly arrived data and the filter is updated, which is shown in Fig. 4c; 4) When <T 1 , R 1 , 12> arrives, since the corresponding rid of the cells is the same as that of the new data record, and the time distance is 12−1 > 10, the filter should be updated as shown in Fig. 4d; 5) When<T 2 , R 2 , 16> arrives, the corresponding cells of T 2 are updated, which is shown in Fig. 4e; 6) When <T 1 , R 2 , 16> arrives, for the rid of the data record is different from the rid of the cells in the filter, the object location is changed.Thus, the filter is updated as shown in Fig. 4f.
It should be noted that  4 An analysis of TSBF's error rates One prominent feature of Bloom filters is that there is a clear trade-off between the size of the filter and the rate of false positives.
Just like the traditional Bloom filters, TSBF also generates false positives, that is, identifies a nonredundant item as a redundant one.On the other side, since TSBF will delete the expired elements through the decay strategy, TSBF also generates false negatives, that is, identifies the redundant item as a non-redundant one.This section will analyse the error rates of TSBF including false positives and false negatives.
Theorem 1.The false positive rate of TSBF is: wherein k represents the amount of hash functions, m represents the amount of cells in the filter, and n represents the amount of the tagged objects inserted in the filter.
Proof: Supposing the hash functions in the TSBF Filter is simple and random, each element can be mapped to m cells in an equal probability with no relation to which other elements are mapped to.In this case, for a particular cell i, its probability without being selected under a specific hash function when inserting a new element is: Thus, its probability of not being mapped by any one in k hash functions is: If n objects are inserted, none of them sets the cell is: . 1 1 Hence, the probability that the cell is set is In the query stage, if all k cells corresponding to an inquired object are set, it can be identified a redundant data.Thereby, the probability of wrong judgment is: Theorem 2. The false negative rate of TSBF is: Among which, m represents the size of the filter, n represents the amount of cells selected randomly subtracted by 1, k represents the times subtracted in total in the filter, x represents the times subtracted for a certain cell, h represents the amount of the hash functions.
Proof: The false negative of TSBF happens due to the reason that the timestamp of the cells is subtracted by 1 randomly.Firstly, the probability of a certain cell subtracted by 1 is: . 1 There exists a time interval for the new data records that the false negative is generated.
Supposing the time value of the present data in the filter is t and the size of the sliding windows is w.According to Definition 2, when the time value of the other data with the same TID is more than t + w, the present data is definitely non-redundant data.However, since a false negative occurs in this cell, that is, if the time value is subtracted by 1 for x times, the time value of this cell should become into t − x.This means when the time value of the other data with the same tid is less than t -x + w, the present data is also definitely identified as redundant data.Therefore, the time interval generating the false negatives is between t + w and t -x + w.
Therefore, the probability of the sequent data with the same TID arriving in this time interval and subtracted by 1 after applying hash functions is as follows: . 1 Hence, the probability of generating false negative of a certain cell in the filter is: The false negative error is one of the k hashed cells of the data is wrong, so the false negative rate of TSBF is:
It can be concluded that Eq. ( 7) is verified.

Performance experiment and analysis 5.1 Experiment preparation
We built a real RFID-based smart supermarket with 10 readers and 200 tags, and collected the data within 1 hour.Based on the data distribution large amount of simulation data are also generated.The number of the readers extends from 20 to 50, and the number of the tagged objects extends from 400 to 1200.Meanwhile, object location movement among different shelves is simulated as well.
The performance measures of the experiments are the false positive rate (FPR) and the false negative rate (FNR) by comparison with TIBF Algorithm in [16].Since there is no false negative in TIBF Algorithm, the comparison is only applied to false positive rate.
The hardware environment is Intel Core i3-2330M CPU with 2GB memory; the software environment is Microsoft Windows 7 and Microsoft Visual C++ 6.0.

Experimental results and analysis
1) The influence of the amount of the hash functions on the error rates.
Supposed the size of the filter is 1 × 104, the amount of the tagged objects is 1 × 105, the size of the sliding window is 10.
It can be seen from Figs. 5 and 6, when the amount of hash functions is between 2÷4, both FPR and FNR of the TSBF method are relatively small, especially when the amount of hash functions is 2, the FPR and FNR are the smallest.The reason is that there is a high probability for different data stored into the same cell if only using 1 hash function, leading to higher FPR and FNR.
Figure 5 The influence of the amount of hash functions on FPR Figure 6 The influence of the amount of hash functions on FNR When there are 2÷4 hash functions, the probability of storing different data into the same cell is relatively low, so FPR and FNR are low correspondingly.Whereas, after that, when the amount of hash functions increases, the filter cells saving the same data will increase as well, which means the load of the filter becomes heavy and the FPR and FNR increase instead.
It can be seen from Fig. 7, FPR of TSBF becomes smaller as the size of the filter enlarges, which is in accordance with the theory analysis result in Section IV.In this figure, the reason for the slight increase of FPR at the 4 × 104 owes to the selection of hash functions.In this experiment, two separate hash functions are selected, both of which might produce collision, leading to the increase of FPR.
Additionally, Fig. 7 also shows that FPR of TSBF Algorithm is much smaller than that of TSIF.The reason lies in the judgment on the reader IDs in TSBF.When the reader ID of a data with the same TID changes, namely the location of the tagged object is changed, TSBF can identify this kind of changes accurately.Yet there is no identification on the reader IDs in TIBF.If the move time of an object is long, TIBF can identify the location movement through the change of time intervals.But if an object's locations are changed frequently, TIBF may make wrong decisions, resulting in relatively large FPR.
Fig. 8 presents the change of FNR of TSBF, in which the rate is kept at a very small range.The reason of this occasion depends on the deletion algorithm that reduces the randomly-selected cells by 1, after which the probability that the next arrival object is hashed to the same cell is very small.Accordingly, FNR is small as well.The number of hash functions FPR TSBF TIBF 3) The influence of the amount of data streams on the error rates Supposing the size of the filter is 1×104, the size of the sliding window is 10，and the amount of the hash functions is 2. Fig. 9 and Fig. 10 show the changes of FPR and PNR under the change of the amount of data streams.We can see from Fig. 9, when the amount of data is smaller than 3 × 105, the increase of FPR of TSBF is relatively slow while it is faster when the amount of data is more than 4 × 105.The reason lies in the deletion algorithm of expired elements in TSBF, which can assure that error rates will not multiply because of increasing data streams.However, the deletion algorithm is unable to delete expired data when data stream multiplies to some extent, so FPR after this period increases fast.In this respect, the FPR of TIBF does not change with the change of the amount of data stream generally.On the whole, TSBF is more effective than TIBF considering performance.
Fig. 10 shows the change of FNR of TSBF.It is apparent that FNR of TSBF keeps at a very small level because of periodical deletion on the expired element.
4) The influence of the sliding window size on the error rates Supposing the size of the filter is 1 × 104, the amount of RFID data is 1 × 105, the amount of the hash functions is 2. Fig. 11 and Fig. 12 show the change of FPR and FNR with change of the size of the sliding windows.
Fig. 11 reveals that FPRs of both TSBF Algorithm and TIBF Algorithm increases as the size of the sliding windows enlarges.Yet this occasion is different from the ones previously in that FPR increases though data generating false positives do not increase.The reason is that the total amount of data decreases after filtered with the enlargement of the window size.
As for Fig. 12, it can be seen that FNR of TSBF changes hardly with the change of the size of the window, which it owes to the amount of data both arriving at the filter and generating false negatives decreases.

Conclusions
Focusing on eliminating redundant data in RFID data stream, this paper studies an approximate filtering strategy, i.e., the temporal-spatial bloom filter, for RFID data streams in mobile environment.In the filter, each cell is presented as a two-dimensional array for storing reader's IDs and observation time, which can support location movement of the RFID objects.The error rates of the filter, including false positives and false negatives, are analysed in theory.Through experiment comparison and analysis, the suggested filter is verified as feasible and effective.
In the future, we will study the redundant RFID data filtering for uncertain RFID data streams.The number of tuples (unit: 10000) FPR TSBF TIBF

Figure 1
Figure 1 An example of redundant data in RFID data streams

Figure 4
Figure 4 An Example of TSBF 1) When it starts ， <T 1 , R 1 , 2> arrives at the filter.Assumed the data is mapped to the cells of 0, 2 and 5 by using the hash functions.For the data is a newly arrived data, the domains both of the reader and the time of the three cells are updated, rid = 1, ts = 2, which is shown in Fig.4a; 2) When <T 1 , R 1 , 4> arrives, its tid is the same as that of the former record, so they are mapped to the same locations in the filter.Moreover, since the reader IDs of the three cells are the same and the time distance is 4−2 < 10, this data is identified as a redundant one with no need of updating the filter as is shown in Fig.4b; 3) When <T 2 , R 2 , 4> arrives，it is mapped to the cells of 1, 4 and 6.The timestamps of all the three cells are 0, so it is identified as the newly arrived data and the filter is updated, which is shown in Fig.4c; 4) When <T 1 , R 1 , 12> arrives, since the corresponding rid of the cells is the same as that of the new data record, and the time distance is 12−1 > 10, the filter should be updated as shown in Fig.4d; 5) When<T 2 , R 2 , 16> arrives, the corresponding cells of T 2 are updated, which is shown in Fig.4e; 6) When <T 1 , R 2 , 16> arrives, for the rid of the data record is different from the rid of the cells in the filter, the object location is changed.Thus, the filter is updated as shown in Fig.4f.
Fig 4 only considers how to manage the newly arrival data in the data streams, but does not consider the deletion of the expired data in the streams.

Figure 7 Figure 8
Figure 7The influence of the size of the filter on FPR

Figure 9 Figure 10
Figure 9The influence of the amount of data streams on FPR

Figure 11 Figure 12
Figure 11The influence of the sliding window size on FPR