| Size: 5368 Comment:  | Size: 3948 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 8: | Line 8: | 
| Instead of using Hyper-buckets that have constant densities which can not be updated reasonably using the MaxCountProgramNotes ideas, we propose a probabilistic method where by we define a probability density function in each hyper-bucket. In a sense we are not trying to minimize skew in the bucket creation process, but recognizing and modeling skew in each bucket. The added advantage to this concept is that a region equivalent to a hyper-bucket containing no points may be excluded from the index. Consequently the algorithm searches a smaller space. For example if it becomes necessary to shrink the size of a hyper-bucket to <<latex($10~unit^{6}$)>> size in a <<latex($10000~unit^6$)>> space we will have <<latex($1 \times 10^{18}$)>> buckets. With points concentrated in specific regions we will only have to track a fraction of these in the index. | Instead of using Hyper-buckets that have constant densities which can not be updated reasonably using the MaxCountProgramNotes ideas, we propose a probabilistic method where by we define a probability density function in each hyper-bucket. In a sense we are not trying to minimize skew in the bucket creation process, but recognizing and modeling skew in each bucket. The added advantage to this concept is that a region equivalent to a hyper-bucket containing no points may be excluded from the index. Consequently the algorithm searches a smaller space. For example if it becomes necessary to shrink the size of a hyper-bucket to $$10 unit^6$$ size in a 10000 unit^6^ space we will have 1$$\times$$10^18^ buckets. With points concentrated in specific regions we will only have to track a fraction of these in the index. | 
| Line 16: | Line 16: | 
| a. A multi-dimensional probability function preferably a function that uses functions as parameters e.g. <<latex($p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$)>> | a. A multi-dimensional probability function preferably a function that uses functions as parameters e.g. $$p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$$ | 
| Line 30: | Line 30: | 
| a. Each bucket's corner verticies are checked for intesection time segments <<latex($[entry,exit]$)>> by solving a linear equation between the query points and the corner point. a. Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment <<latex($[Entry_{bucket},Exit_{bucket}]$)>> by intersecting the time segments. a. Add <<latex($Entry_{bucket}$)>>, <<latex($Exit_{bucket}$)>> and all times ''t'' such that <<latex($Entry_{bucket} \leq t \leq Exit_{bucket}$)>> to a list to form the valid function time segments for this bucket. | a. Each bucket's corner verticies are checked for intesection time segments $$[entry,exit]$$ by solving a linear equation between the query points and the corner point. a. Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment $$[Entry_{bucket},Exit_{bucket}]$$ by intersecting the time segments. a. Add $$Entry_{bucket}$$, $$Exit_{bucket}$$ and all times ''t'' such that $$Entry_{bucket} \leq t \leq Exit_{bucket}$$ to a list to form the valid function time segments for this bucket. | 
| Line 40: | Line 40: | 
| {{{#!latex \usepackage{amsmath}% \setcounter{MaxMatrixCols}{30}% \usepackage{amsfonts}% \usepackage{amssymb}% \usepackage{graphicx} \usepackage{geometry} \newtheorem{theorem}{Theorem} \newtheorem{acknowledgement}[theorem]{Acknowledgement} \newtheorem{algorithm}[theorem]{Algorithm} \newtheorem{axiom}[theorem]{Axiom} \newtheorem{case}[theorem]{Case} \newtheorem{claim}[theorem]{Claim} \newtheorem{conclusion}[theorem]{Conclusion} \newtheorem{condition}[theorem]{Condition} \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{criterion}[theorem]{Criterion} \newtheorem{definition}[theorem]{Definition} \newtheorem{example}[theorem]{Example} \newtheorem{exercise}[theorem]{Exercise} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{notation}[theorem]{Notation} \newtheorem{problem}[theorem]{Problem} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{remark}[theorem]{Remark} \newtheorem{solution}[theorem]{Solution} \newtheorem{summary}[theorem]{Summary} \newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}} \geometry{left=0.5in,right=0.5in,top=0.5in,bottom=0.5in} | Essentially we are going to have a constant $c$ that is a multiplied by function f which is made up of $$x[j]$$ and $$a[j]$$. Thus we have the density function F as follows: | 
| Line 71: | Line 42: | 
| %%end-prologue%% Essentially we are going to have a constant $c$ that is a multiplied by function f which is made up of $x[j]$ and $a[j]$. Thus we have the density function F as follows: \begin{equation} F = c(a[0] x_0 + b[0])(a[1] x_1 + b[1])...(a[5] x_5 + b[5]) \end{equation} | $$$F = c(a[0] x_0 + b[0])(a[1] x_1 + b[1])...(a[5] x_5 + b[5])$$$ | 
| Line 81: | Line 45: | 
| \begin{equation} \label{eq:DensityFunction} F_i = \frac{b_i f_i}{n \mathop{\displaystyle\int}\limits_{B_i} f_i d \phi} \end{equation} $f_i$ is defined as follows: \begin{equation} f_i = \mathop{\displaystyle\prod}\limits_j (f_{i,j} + p_i) \end{equation} $f_{i,j}$ is a trend function which we will implement as a linear function: \begin{equation} f_{i,j} = a x_j + c_j \end{equation} where $c_j$ is a constant. Thus the form of the Density function is | $$$F_i = \frac{b_i f_i}{n \mathop{\displaystyle\int}\limits_{B_i} f_i d \phi}$$$ $$f_i$$ is defined as follows: $$$f_i = \mathop{\displaystyle\prod}\limits_j (f_{i,j} + p_i)$$$ $$f_{i,j}$$ is a trend function which we will implement as a linear function: $$$f_{i,j} = a x_j + c_j$$$ where $$c_j$$ is a constant. Thus the form of the Density function is | 
| Line 94: | Line 58: | 
| }}} | 
Dynamic Max Count
This contains the ideas and notes for a Dynamic Max Count (Dynamic Max-in-time) aggregate operator
Concept
Instead of using Hyper-buckets that have constant densities which can not be updated reasonably using the MaxCountProgramNotes ideas, we propose a probabilistic method where by we define a probability density function in each hyper-bucket. In a sense we are not trying to minimize skew in the bucket creation process, but recognizing and modeling skew in each bucket. The added advantage to this concept is that a region equivalent to a hyper-bucket containing no points may be excluded from the index. Consequently the algorithm searches a smaller space. For example if it becomes necessary to shrink the size of a hyper-bucket to $$10 unit6$$ size in a 10000 unit6 space we will have 1$$\times$$1018^ buckets. With points concentrated in specific regions we will only have to track a fraction of these in the index.
Here is a description of the index:
- Index defines - Spatial dimensions
- Bucket Dimensions
- Histogram divisions that determine the level of approximation in each hyper-bucket (see below)
- A multi-dimensional probability function preferably a function that uses functions as parameters e.g. $$p(x_u(t),x_l(t),y_u(t),y_l(t)[,z_u(t),z_l(t)])$$
 
- A theory to update, delete or insert points and the distributions based on changes to points. 
The above has been implemented in C# 8/15/2005.
Thus we maintain a database of 4-dimensional points that we index using 6-dimensional, probability buckets.
Dynamic Max-Count
The dynamic property of Max-Count is based on the dynamic index structure described above and in my paper (not currently online).
Given two query points that describe a lower and upper moving query hyper-rectangle, we first create a time partition ordering of indicating the time each hyper-bucket enters and leaves the query rectangle. This is done in two stages.
- Entry, exit and valid function time segments are determined for each bucket. - Each bucket's corner verticies are checked for intesection time segments $$[entry,exit]$$ by solving a linear equation between the query points and the corner point.
- Using the list of entry and exit times for each dimension determine the hyper-bucket intersection time segment $$[Entry_{bucket},Exit_{bucket}]$$ by intersecting the time segments.
- Add $$Entry_{bucket}$$, $$Exit_{bucket}$$ and all times t such that $$Entry_{bucket} \leq t \leq Exit_{bucket}$$ to a list to form the valid function time segments for this bucket. 
 
- Add the valid function time segments entry and exit values to the time partition order list with a tag indicating that at this time hyper-bucket ID is entering or leaving the query space. 
- Create a valid hyper-bucket list and add or delete hyper-buckets to it as you traverse the time partition order list. For each partition add to the parition function the valid hyper-bucket functions.
- Traverse the list again maximizing each partition and keeping track of the maximum.
- return the maximum.
Density Functions
Essentially we are going to have a constant $c$ that is a multiplied by function f which is made up of $$x[j]$$ and $$a[j]$$. Thus we have the density function F as follows:
$$$F = c(a[0] x_0 + b[0])(a[1] x_1 + b[1])...(a[5] x_5 + b[5])$$$
In the Journal paper the Density Function is given by the following:
$$$F_i = \frac{b_i f_i}{n \mathop{\displaystyle\int}\limits_{B_i} f_i d \phi}$$$
$$f_i$$ is defined as follows:
$$$f_i = \mathop{\displaystyle\prod}\limits_j (f_{i,j} + p_i)$$$
$$f_{i,j}$$ is a trend function which we will implement as a linear function:
$$$f_{i,j} = a x_j + c_j$$$
where $$c_j$$ is a constant. Thus the form of the Density function is given by (2).
