DeepArc

Last published: 2005-01-18

DeepArc was developped by the National Library of France (BnF) with XQuark to transform relational database content into XML for archiving purposes. It is part of the International Internet Preservation Consortium (IIPC) tool suite for web archiving.

Presentation

DeepArc is a graphical editor which allow users to establish a mapping between an existing relational data models and one or several target data models, specified as XML Schemas. The tool relies on XQuery to extract information from the relational database and structure it according to the target schemas. It can export the database content into an XML document conformant to the chosen schemas.

Prerequisites

DeepArc has been run on Linux, Windows and Mac OS X plateforms with

  1. Sun JVM 1.4.2 (http://java.sun.com/j2se/1.4.2/download.html) or later
  2. a DBMS among the following : Oracle8i or 9i, SQLServer 2000, MySQL 3.23.x or later, Sybase 11.9.2
  3. JDBC 2 driver to connect to the database (MySQL JDBC driver has been packaged with DeepArc)

Downloads

All releases (sources and binaries) are available on the Sourceforge Downloads page: http://sourceforge.net/projects/deeparc/

Context

DeepArc was developped to archive a very particular case of deep websites which may be called documentary gateways. The purpose of a documentary gateway is to give access to an important and growing number of digital objects (books, articles, images, etc.). Access may be free or not, open or restricted to subscribers.
Rather than using a set of indexes (i.e. static HTML files), objects descriptions and ids are stored as records in a hierarchical, object oriented or, more often, relational database, while objects are stored in a file system. Users are offered a form-based search interface where they may key in keywords. Keyword submission causes the execution of scripts which query the database management system, computes the results and builds links to the objects.

Archiving documentary gateways has to be done in two stages:

But archiving the database raises several problems:
The term "database" is commonly used to refer to three components: 1/ the database content, a set of records spread over tables; 2/ the database management system (DBMS), i.e. the piece of software used to store and manage the content; 3/ the application, i.e. scripts usually written in a wide variety of languages, database formats are "closed" and proprietary and both the structure and the data rendering depend on specific software which run on specific operating systems, scripts are the only way to link the document and its associated information.

To solve these problems, database structure and contents need to be migrated to an open and structured format which will create or retain the link between the document and its information. That is DeepArc purpose.

DeepArc has to be installed by the web publisher. The person who has the most precise and technical understanding of the database structure and data model performs the mapping according to the target data model specified by the Archive, provokes the extraction of the content and delivers to the Archivie the resulting XML document along with the digital objects.

More information on web archiving can be found:
BnF website: http://www.bnf.fr/pages/infopro/depotleg/dli_intro.htm
IIPC website: http://netpreserve.org

Contacts

Functional: Sara Aubry -
Technical: Younès Hafri -

SourceForge.net Logo