S3DB - A semantic tool for data integration in the Life Sciences: 2006

Wednesday, December 06, 2006

Incubating Ontologies

To incubate an ontology is to create a data model to integrate in a common structure. A particular set of data without a priori assumptions about how the final product should look like. The task of turning biological data into manageable digitalized formats demands for annotations of the data, which often happens in a sloppy, deorganized manner. Structures for organizing it often take considerable time to developed and are not always fully usefull. Agreement on all parts involved is also hard to achieve.
Therefore, incubating an ontology is the means through which unstructured, annotated data slowly and steadily gains organization by maintenance and small aditions until it turns into a usefull ontology, a strong infrastructure gained from inferences and corrections on the initial annotations, that can be distributed, shared and fit the purpose of bridging the informational gap that exists between universal data storage and mining.
Using an analogy, such ontology development would be somewhat like the process of building a bridge between two points. The first bridge would be "sloppy", just to fit the purpose of enabling a rough passage between the two points. With time and usage, this bridge could then begin to be elaborated, the materials used to build it would differ, become more sturdy until one could easily and safely go through it. Of course, not all first stage bridges achieve this last stage, but those that do will hardly ever lose it's importance as a landmark.

Monday, November 06, 2006

Why go Semantic?

It has recently been discussed in the mindswap weblog (6th November 2006) about the need to discuss and clarify what do semantic web technologies bring new into the field of data analysis, and why not remain in the relational?
I think they have a very good point. Why go semantic when we can stay on the relational? Anyway, anyone who matters already knows the relational, why learn RDF or OWL, right?
No. I think what is so amazing and breakthrough technology about the semantic web is it's intuitiveness. While anyone can understand and visualize nodes connected to nodes that make up a whole, having to memorize tables and table connects, primary and foreign keys, etc, is a bit more cumbersome.
Semantic web will make data resources accessible to more people and to the people that matter - the ones generating the data. I think what defines a technology is the ratio between it's usefullness and the amount of computational support needed.
Semantic web will win, I think, mostly by sociological reasons - if the biologist is the data modeler, and he knows that the tools to analyse his data can be called by complying to a particular ontology (this is my definition of ontology-driven data analysis), then it will become an incentive to use such ontology. With the widespread of both databases and algorithms dependent on ontologies, changes in the ontology will not necessarily affect the flow of analysis, as can happen with relational databases.

Saturday, November 04, 2006

Ontology Driven data Analysis

Many researchers have come to realize that ontologies will definitelly bridge the gap between databases and analysis algorithms. But how to do that in a level of abstraction that is usefull regardless of the data structure? Data analysis tools often become obsolete and need to be adapted as new significant parameters emerge from the data collection. Ontologies are already being used as effective tools for integrating databases and data mining tools for deriving knowledge. But this often happens at a large scale of data wharehouses, where the bench biologist trying to derive conclusions from their data have little to say in the manner the data analysis is conducted.
Ontology driven analysis tools should be flexible enough to accept entry of new parameters that might, or not, improve the probability of the conclusions.

Sunday, October 29, 2006

New release

The new release of S3DB is finnaly out.
The bugs reported in the previous version have been fixed and many new features have been added.
This new S3DB version (1.0) comes equipped with RDF project importer (besides the already existing XML project importer), as well as enabling projects to share resources, a functionality that has been deeply needed, particularly in projects regarding the life sciences community.
Also, installation has become easier for MySQL users, as the database now creates itself (in previous versions, a small step of creating the database in the MySQL command line as well as creating a user to access it were necessary).

Friday, September 29, 2006

Rich Matlab client at BioinformaticStation.org

The recent development of a query language for automated interaction with S3DB, designated as S3QL, has enabled the development of client applications with rich graphic capabilities. Taking advantage of Matlab's new compiler toolbox, such a prototype is being developed at BioinformaticStation.org. A dedicated Blogger at BioinformaticStation.blogspot.com discusses this further and the S3DB related bits will be collected here.

Monday, September 25, 2006

RDF problem/solution

The Resource Description Framework is a tool to represent graphics by enabling description of triplets - two related concepts and the nature of their relatioship. It was due to this simplicity that it was chosen to represent S3DB statements. The permissiveness of vocabularies to describe particular pieces of level specific information by S3DB indexes and the graphical nature of RDF complement each other. Thus, we see in RDF the ideal platform where domain specific vocabularies, loosely coupled nomeclatures for emerging technologies and even lab jargon can be thrown in the same bucket of knowledge.

The challenge, thus, is developing a comprehensive representation of the slowly evolving, often shifting data models. Although this could be accomplished by representing the model in XML (and indeed there is a module in S3DB that enables this approach), we are seeking a more flexible approach, one which does not impose a hierarquical structure or even the determination of classes and subclasses. RDF, coupled with RDF Schema for description of the few controlled vocabularies and relashioships needed to make the model usefull and functional on S3DB seem to provide a good solution.

Friday, September 15, 2006

RDF export/import

A project in S3DB can be exported in the RDF/N3 format (w3c Notation 3 definition), the more "readable" version of RDF.

For an example of the nomenclature used for exporting ontologies, see N3 example or the RDF/XML convertion. #R, #V and #S refer to resources, rules and statements on S3DB. Similarly, #P refers to a project on S3DB (check out our paper for the definition of what each of these are).

How to read the RDF?
This RDF annotates data to the indexing schema, therefore each element can be backtracked to other elements in the document. In reality, this example carries only 6 statements, the first six lines. The rest of the document has information regarding the "metadata" needed to describe those 3 statement indexes.

The best way to understand this example is on the interface visualization of the "Example Project" in the demo.
Try adding a "House" and properties of that house, for example, adding a "Location" and an "Ocupant". If you export the RDF (Example Project/Export project in RDF), you can clearly see what changed in the statements.

Monday, September 11, 2006

S3DB paper comes out

The description of the indexing schema behind S3DB has been published on Nature Biotechnology this week. It describes the sources of inspiration and the ideas involded in the creation of S3DB: Data integration gets "Sloppy"

Sunday, September 10, 2006

Interface

Some changes in the interface have been made, namely the project management page, which now includes several options.

Sunday, July 23, 2006

Import from excel

The bugs on importing data from an excel spreadsheet have been fixed.

Thursday, July 20, 2006

Debugging Status

The following modules of s3db are being updated:

- Sharing resources in different projects.
- Import from RDF
- Import from excel

Monday, May 15, 2006

Towards Data integration

A new feature has been created on the import module of S3DB. Import now validates UID and finds the appropriate information on it.
Try it at the demo.

Sunday, May 07, 2006

Source code release

The source code that involves the project tree improvements has already been released in Sourceforge. A couple of bugs involving postgres compatibility ha also been resolved. Try our demo an report to us any bugs you might come into, comments from S3DB users are highly appreciated and in fact the best way to improve S3DB!

Demo relocation

The s3db demo has been relocated.
It now runs on a faster, more realiable linux machine! The link through s3db.org still works.

Tuesday, May 02, 2006

Improvements on project tree

It is now possible to create resources and rules directly on the project tree.
This is part of the interface improvement that will save at least 6 un-necessary clicks.

Saturday, April 01, 2006

S3DB has been released

The first version of S3DB has been released at s3db.sourceforge.net
Is is also accessible through S3DB.org , where we currently have a demo version running.

S3DB - A semantic tool for data integration in the Life Sciences