Thursday, February 21, 2008

Deposition Tools Coming Soon!

Update: Dataset Tools has been released

A set of desktop tools that will make depositing crystallography datasets into a Fedora 2.2.1 repository easier are about to be released in beta on this site. Fedora is the fundamental persistent storage technology behind digital repository software such as Arrow and Fez.

Each tool was programmed in Java for cross-platform compatibility, each having a GUI version as well as command-line functionality. Together they aim to make it easier for researchers to package their data, create valid metadata, both technical and repository-based, upload the data and then unpackage it again once downloaded. The GUI tools are:

RepositoryPackager

Created to take a set of diffraction images, tar archive them, bzip2 compress them and then split the resultant file on a chosen file size. This is done in an effort to make data more space efficient in a repository, and because large files (over 2GB) were found to crash various server software in the upload/storage process.

Uses the apache tar/bzip2 java libraries and incorporates the accompanying command-line tools TarBzipper and FileSplitter.

METSManager

Fedora repositories ingest METS XML files to create and describe entries being stored. This program uses the Harvard METS Java Toolkit to create a fedora-compatible METS package that includes entered values for ingestion. Currently, DublinCore data is created within the METS package to describe basic values such as the title of the object in the repository and the authors. Technical xml metadata relating to the experiment itself is currently put as plain text in the Description field.

Created METS XML files can be validated against the Fedora-compatible METS schema to test their eligibility for ingestion.

Future plans include the embedding of technical XML data such as that described in the data section of this site for data harvesting.

Is a GUI implementation of the accompanying command-line tool METSCreator.

DataDepositor

Once data is organised for upload, and a METS XML file is created, the object and its data are ready to be ingested and uploaded into the repository. This program makes such a process simple by scanning a supplied directory (non-recursive since the structure of repository objects are flat) and uploading each file into a new object described by the METS XML file.

Uses the fedora-management API, and is a GUI implementation of the accompanying command-line tool of the same name.

Currently only compatible with Fedora 2.2.1 repositories.

RepositoryUnPackager

Once a split set of files created by RepositoryPackager are downloaded again from the repository, they can be re-joined, uncompressed and unarchived to restore them to their original form using this program.

Is a GUI implementation of the accompanying command-line tools FileJoiner and UnTarBzipper.

-

These tools will be open-source and hosted on SourceForge for the community to freely use, view and modify. Additionally a user guide will be created to guide users through the process of using the tools, and also for setting up a compatible Fedora repository.

Expect this site to be updated with beta versions of the tools within the next 2 weeks.

No comments: