Differences between revisions 19 and 20

Dynamic Max Count

This contains the ideas and notes for a Dynamic Max Count (Dynamic Max-in-time) aggregate operator

Concept

Instead of using Hyper-buckets that have constant densities which can not be updated reasonably using the MaxCountProgramNotes ideas, we propose a probabilistic method where by we define a probability density function in each hyper-bucket. In a sense we are not trying to minimize skew in the bucket creation process, but recognizing and modeling skew in each bucket. The added advantage to this concept is that a region equivalent to a hyper-bucket containing no points may be excluded from the index. Consequently the algorithm searches a smaller space. For example if it becomes necessary to shrink the size of a hyper-bucket to latex2($10~unit^{6}$) size in a latex2($10000~unit^6$) space we will have latex2($1 \times 10^{18}$) buckets. With points concentrated in specific regions we will only have to track a fraction of these in the index.

Here is a description of the index:

Index defines
1. Spatial dimensions
2. Bucket Dimensions
3. Histogram divisions that determine the level of approximation in each hyper-bucket (see below)
4. A multi-dimensional probability function preferably a function that uses functions as parameters e.g. latex2($$p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$$)
A theory to update, delete or insert points and the distributions based on changes to points.

The above has been implemented in C# 8/15/2005.

Thus we maintain a database of 4-dimensional points that we index using 6-dimensional, probability buckets.

Dynamic Max-Count

The dynamic property of Max-Count is based on the dynamic index structure described above and in my paper (not currently online).

Given two query points that describe a lower and upper moving query hyper-rectangle, we first create a time partition ordering of indicating the time each hyper-bucket enters and leaves the query rectangle. This is done in two stages.

Entry, exit and valid function time segments are determined for each bucket.
1. Each bucket's corner verticies are checked for intesection time segments latex2($[entry,exit]$) by solving a linear equation between the query points and the corner point.
2. Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment latex2($[Entry_{bucket},Exit_{bucket}]$).
3. Add latex2($Entry_{bucket}$), latex2($Exit_{bucket}$) and all times t such that latex2($Entry_{bucket} \leq t \leq Exit_{bucket}$) to a list to form the valid function time segments for this bucket.
Add the valid function time segments entry and exit values to the time partition order list with a tag indicating that at this time hyper-bucket ID is entering or leaving the query space.
Create a valid hyper-bucket list and add or delete hyper-buckets to it as you traverse the time partition order list. For each partition add to the parition function the valid hyper-bucket functions.
Traverse the list again maximizing each partition and keeping track of the maximum.
return the maximum.

from (a) are by intersecting the time segments. by checking the corner vertices entering and leaving the query space in each dimension. All of the times that fall between or on the least time and greatest time when all dimensions are in the query space are selected. These time segments are added to the time partition list along with a hyper-bucket identifier.

-  ⇤ ← Revision 19 as of 2005-08-17 18:11:33 → 
  Size: 3655
  Editor: yakko
  Comment:
+   ← Revision 20 as of 2005-08-17 18:12:07 → ⇥
  Size: 3655
  Editor: yakko
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 13:
-. Spatial dimensions
      1. Bucket Dimensions
      1. Histogram divisions that determine the level of approximation in each hyper-bucket (see below)
      1. A multi-dimensional probability function preferably a function that uses functions as parameters e.g. [[latex2($$p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$$)]]
+      a. Spatial dimensions
      a. Bucket Dimensions
      a. Histogram divisions that determine the level of approximation in each hyper-bucket (see below)
      a. A multi-dimensional probability function preferably a function that uses functions as parameters e.g. [[latex2($$p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$$)]]
 Line 30:
-. Each bucket's corner verticies are checked for intesection time segments [[latex2($[entry,exit]$)]] by solving a linear equation between the query points and the corner point.
      1. Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment [[latex2($[Entry_{bucket},Exit_{bucket}]$)]].
      1. Add [[latex2($Entry_{bucket}$)]],  [[latex2($Exit_{bucket}$)]] and all times ''t'' such that [[latex2($Entry_{bucket} \leq t \leq Exit_{bucket}$)]] to a list to form the valid function time segments for this bucket.
+      a. Each bucket's corner verticies are checked for intesection time segments [[latex2($[entry,exit]$)]] by solving a linear equation between the query points and the corner point.
      a. Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment [[latex2($[Entry_{bucket},Exit_{bucket}]$)]].
      a. Add [[latex2($Entry_{bucket}$)]],  [[latex2($Exit_{bucket}$)]] and all times ''t'' such that [[latex2($Entry_{bucket} \leq t \leq Exit_{bucket}$)]] to a list to form the valid function time segments for this bucket.

Diff for "DynamicMaxCount"

Dynamic Max Count

Concept

Dynamic Max-Count