Table of Contents
The record model described in this chapter applies to the fundamental,
structured XML
record type alvis
, introduced in
Section 2.5.3, “ALVIS Record Model and Filter Module”. The ALVIS XML record model
is experimental, and it's inner workings might change in future
releases of the Zebra Information Server.
This filter has been developed under the ALVIS project funded by the European Community under the "Information Society Technologies" Program (2002-2006).
The experimental, loadable Alvis XML/XSLT filter module
mod-alvis.so
is packaged in the GNU/Debian package
libidzebra1.4-mod-alvis
.
It is invoked by the zebra.cfg
configuration statement
recordtype.xml: alvis.db/filter_alvis_conf.xml
In this example on all data files with suffix
*.xml
, where the
Alvis XSLT filter configuration file is found in the
path db/filter_alvis_conf.xml
.
The Alvis XSLT filter configuration file must be valid XML. It might look like this (This example is used for indexing and display of OAI harvested records):
<?xml version="1.0" encoding="UTF-8"?> <schemaInfo> <schema name="identity" stylesheet="xsl/identity.xsl" /> <schema name="index" identifier="http://indexdata.dk/zebra/xslt/1" stylesheet="xsl/oai2index.xsl" /> <schema name="dc" stylesheet="xsl/oai2dc.xsl" /> <!-- use split level 2 when indexing whole OAI Record lists --> <split level="2"/> </schemaInfo>
All named stylesheets defined inside
schema
element tags
are for presentation after search, including
the indexing stylesheet (which is a great debugging help). The
names defined in the name
attributes must be
unique, these are the literal schema
or
element set
names used in
SRW,
SRU and
Z39.50 protocol queries.
The paths in the stylesheet
attributes
are relative to zebras working directory, or absolute to file
system root.
The <split level="2"/>
decides where the
XML Reader shall split the
collections of records into individual records, which then are
loaded into DOM, and have the indexing XSLT stylesheet applied.
There must be exactly one indexing XSLT stylesheet, which is
defined by the magic attribute
identifier="http://indexdata.dk/zebra/xslt/1"
.
When indexing, an XML Reader is invoked to split the input
files into suitable record XML pieces. Each record piece is then
transformed to an XML DOM structure, which is essentially the
record model. Only XSLT transformations can be applied during
index, search and retrieval. Consequently, output formats are
restricted to whatever XSLT can deliver from the record XML
structure, be it other XML formats, HTML, or plain text. In case
you have libxslt1
running with EXSLT support,
you can use this functionality inside the Alvis
filter configuration XSLT stylesheets.
The output of the indexing XSLT stylesheets must contain
certain elements in the magic
xmlns:z="http://indexdata.dk/zebra/xslt/1"
namespace. The output of the XSLT indexing transformation is then
parsed using DOM methods, and the contained instructions are
performed on the magic elements and their
subtrees.
For example, the output of the command
xsltproc xsl/oai2index.xsl one-record.xml
might look like this:
<?xml version="1.0" encoding="UTF-8"?> <z:record xmlns:z="http://indexdata.dk/zebra/xslt/1" z:id="oai:JTRS:CP-3290---Volume-I" z:rank="47896" z:type="update"> <z:index name="oai:identifier" type="0"> oai:JTRS:CP-3290---Volume-I</z:index> <z:index name="oai:datestamp" type="0">2004-07-09</z:index> <z:index name="oai:setspec" type="0">jtrs</z:index> <z:index name="dc:all" type="w"> <z:index name="dc:title" type="w">Proceedings of the 4th International Conference and Exhibition: World Congress on Superconductivity - Volume I</z:index> <z:index name="dc:creator" type="w">Kumar Krishen and *Calvin Burnham, Editors</z:index> </z:index> </z:record>
This means the following: From the original XML file
one-record.xml
(or from the XML record DOM of the
same form coming from a splitted input file), the indexing
stylesheet produces an indexing XML record, which is defined by
the record
element in the magic namespace
xmlns:z="http://indexdata.dk/zebra/xslt/1"
.
Zebra uses the content of
z:id="oai:JTRS:CP-3290---Volume-I"
as internal
record ID, and - in case static ranking is set - the content of
z:rank="47896"
as static rank. Following the
discussion in Section 9, “Relevance Ranking and Sorting of Result Sets”
we see that this records is internally ordered
lexicographically according to the value of the string
oai:JTRS:CP-3290---Volume-I47896
.
The type of action performed during indexing is defined by
z:type="update">
, with recognized values
insert
, update
, and
delete
.
In this example, the following literal indexes are constructed:
oai:identifier oai:datestamp oai:setspec dc:all dc:title dc:creator
where the indexing type is defined in the
type
attribute
(any value from the standard configuration
file default.idx
will do). Finally, any
text()
node content recursively contained
inside the index
will be filtered through the
appropriate charmap for character normalization, and will be
inserted in the index.
Specific to this example, we see that the single word
oai:JTRS:CP-3290---Volume-I
will be literal,
byte for byte without any form of character normalization,
inserted into the index named oai:identifier
,
the text
Kumar Krishen and *Calvin Burnham, Editors
will be inserted using the w
character
normalization defined in default.idx
into
the index dc:creator
(that is, after character
normalization the index will keep the inidividual words
kumar
, krishen
,
and
, calvin
,
burnham
, and editors
), and
finally both the texts
Proceedings of the 4th International Conference and Exhibition:
World Congress on Superconductivity - Volume I
and
Kumar Krishen and *Calvin Burnham, Editors
will be inserted into the index dc:all
using
the same character normalization map w
.
Finally, this example configuration can be queried using PQF queries, either transported by Z39.50, (here using a yaz-client)
Z> open localhost:9999 Z> elem dc Z> form xml Z> Z> f @attr 1=dc:creator Kumar Z> scan @attr 1=dc:creator adam Z> Z> f @attr 1=dc:title @attr 4=2 "proceeding congress superconductivity" Z> scan @attr 1=dc:title abc
or the proprietary
extentions x-pquery
and
x-pScanClause
to
SRU, and SRW
http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=%40attr+1%3Ddc%3Acreator+%40attr+4%3D6+%22the http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr+1=dc:date+@attr+4=2+a
See Chapter 12, The SRU/SRW Server for more information on SRU/SRW configuration, and Section 11, “YAZ Frontend Virtual Hosts” or the YAZ manual CQL section for the details of the YAZ frontend server CQL configuration.
Notice that there are no *.abs
,
*.est
, *.map
, or other GRS-1
filter configuration files involves in this process, and that the
literal index names are used during search and retrieval.