This example shows how Zebra can be used with absolutely minimal configuration to index a body of XML documents, and search them using XPath expressions to specify access points.
Go to the examples/zthes
subdirectory
of the distribution archive.
There you will find a Makefile
that will
populate the records
subdirectory with a file of
Zthes
records representing a taxonomic hierarchy of dinosaurs. (The
records are generated from the family tree in the file
dino.tree
.)
Type make records/dino.xml
to make the XML data file.
(Or you could just type make dino
to build the XML
data file, create the database and populate it with the taxonomic
records all in one shot - but then you wouldn't learn anything,
would you? :-)
Now we need to create a Zebra database to hold and index the XML
records. We do this with the
Zebra indexer, zebraidx
, which is
driven by the zebra.cfg
configuration file.
For our purposes, we don't need any
special behaviour - we can use the defaults - so we can start with a
minimal file that just tells zebraidx
where to
find the default indexing rules, and how to parse the records:
profilePath: .:../../tab recordType: grs.sgml
That's all you need for a minimal Zebra configuration. Now you can roll the XML records into the database and build the indexes:
zebraidx update records
Now start the server. Like the indexer, its behaviour is
controlled by the
zebra.cfg
file; and like the indexer, it works
just fine with this minimal configuration.
zebrasrv
By default, the server listens on IP port number 9999, although this can easily be changed - see Section 1, “Running the Z39.50 Server (zebrasrv)”.
Now you can use the Z39.50 client program of your choice to execute XPath-based boolean queries and fetch the XML records that satisfy them:
$ yaz-client @:9999 Connecting...Ok. Z> find @attr 1=/Zthes/termName Sauroposeidon Number of hits: 1 Z> format xml Z> show 1 <Zthes> <termId>22</termId> <termName>Sauroposeidon</termName> <termType>PT</termType> <termNote>The tallest known dinosaur (18m)</termNote> <relation> <relationType>BT</relationType> <termId>21</termId> <termName>Brachiosauridae</termName> <termType>PT</termType> </relation> <idzebra xmlns="http://www.indexdata.dk/zebra/"> <size>300</size> <localnumber>23</localnumber> <filename>records/dino.xml</filename> </idzebra> </Zthes>
Now wasn't that nice and easy?