Chapter 1 + Section 2.1 Introduction


Information Retrieval Process

Section 2.2 A taxonomy of Information Retrieval Models

Section 2.3 Retrieval: Ad hoc and Filtering

The following is the formal definition for IR from MIR p 23.

\newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}}


An information retrieval model is a quadruple $D,Q,F,R(q_i , d_j))$ where 
\item $D$ is a set composed of logical views (or representations) for the {\bf documents} in the collection.
\item $Q$ is a set composed of logical views (or representations) for the user information needs. Such representations are called {\bf queries}
\item $F$ is a {\bf framework} for modeling document representations, queries and their relationships.
\item $R(q_i,d_j)$ is a {\bf ranking function} wich associates a real number with a query $q_i \in Q$ and a document represenation $d_j \in D$. Such ranking defines an ordering among the documents with regard to the query $q_i$.

Section 2.5.1 Basic Concepts of Classic IR

\newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}}


Let $t$ be the number of index terms in the system and $k_i$ be a generic index term. $K={k_1,...,k_t}$ is the set of all index terms. A weight $w_{i,j} > 0$ is associated with each index term $k_i$ of  a document $d_j$. For an index term which does not appear in the document text, $w_{i,j}=0$. With document $d_j$ is associated an index term vector $\vec{d}_{j}=(w_{1,j},w_{2,j},...,w_{t,j})$. Further, let $g_{i}$ be a function that returns the wieght associated with the index term $k_{i}$ in any $t$-dimensional vector (i.e., $g_{i}(\vec{d}_{j})=w_{i,j}$).

Section 2.5.2 Boolean Model

normal form as $[\vec{q}_{dnf}=(1,1,1)\vee (1,1,0)\vee (1,0,0)]$