Dynamic Max Count
This contains the ideas and notes for a Dynamic Max Count (Dynamic Max-in-time) aggregate operator
Concept
Instead of using Hyper-buckets that have constant densities which can not be updated reasonably using the MaxCountProgramNotes ideas, we propose a probabilistic method where by we define a probability density function in each hyper-bucket. In a sense we are not trying to minimize skew in the bucket creation process, but recognizing and modeling skew in each bucket. The added advantage to this concept is that a region equivalent to a hyper-bucket containing no points may be excluded from the index. Consequently the algorithm searches a smaller space. For example if it becomes necessary to shrink the size of a hyper-bucket to latex2($10~unit^{6}$) size in a latex2($10000~unit^6$) space we will have latex2($1 \times 10^{18}$) buckets. With points concentrated in specific regions we will only have to track a fraction of these in the index.
Here is a description of the index:
- Index defines
- Spatial dimensions
- Bucket Dimensions
- Histogram divisions that determine the level of approximation in each hyper-bucket (see below)
A multi-dimensional probability function preferably a function that uses functions as parameters e.g. latex2(
)
A theory to update, delete or insert points and the distributions based on changes to points.
The above has been implemented in C# 8/15/2005.
Thus we maintain a database of 4-dimensional points that we index using 6-dimensional, probability buckets.
Dynamic Max-Count
The dynamic property of Max-Count is based on the dynamic index structure described above and in my paper (not currently online).
Given two query points that describe a lower and upper moving query hyper-rectangle, we first create a time partition ordering of indicating the time each hyper-bucket enters and leaves the query rectangle. This is done in two stages.
- Entry, exit and valid function time segments are determined for each bucket.
Each bucket's corner verticies are checked for intesection time segments latex2($[entry,exit]$) by solving a linear equation between the query points and the corner point.
Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment latex2($[Entry_{bucket},Exit_{bucket}]$) by intersecting the time segments.
Add latex2($Entry_{bucket}$), latex2($Exit_{bucket}$) and all times t such that latex2($Entry_{bucket} \leq t \leq Exit_{bucket}$) to a list to form the valid function time segments for this bucket.
Add the valid function time segments entry and exit values to the time partition order list with a tag indicating that at this time hyper-bucket ID is entering or leaving the query space.
- Create a valid hyper-bucket list and add or delete hyper-buckets to it as you traverse the time partition order list. For each partition add to the parition function the valid hyper-bucket functions.
- Traverse the list again maximizing each partition and keeping track of the maximum.
- return the maximum.