|
|
| |
KNB Informatics Research
|
Introduction
|
|
Ecological data are extremely variable in their syntax and
semantics. This section explains the approaches being used for
managing complex ecological data and metadata from our various
collaborators. It explains our goals, and some of the reasoning for
the architecture that we are developing.
|
|
Goals
|
|
Our collaborators and colleagues include ecological and
environmental scientists spread around the nation (for that matter,
the world). Thus, the data that they generate are also dispersed: it
is collected in widely dispersed locations, and it is housed at a
variety of widely dispersed institutions. This is appropriate
because it keeps data close to its primary user, the data
owner/collector.
However, the formation of NCEAS , the LTER Network Office , and similar institutions
aimed at cross-site, interdisciplinary, synthetic research has
demonstrated that this distributed set of valuable data are largely
inaccessible to anyone except the original investigators. So, what
is good for the individual investigator (local data) is not always
the best for the ecological community (lack of access to national
data resources).
Thus, the KNB. We conceived of the KNB as a mechanism
for scientists to discover, access, interpret, analyze, and
synthesize the wealth of data that is collected by ecological and
environmental scientists nationally (and eventually
internationally). The infrastructure for this network must deal with
the major impediments to synthesizing data on ecology and the
environment:
- Data is widely dispersed
- Data is heterogeneous
- Synthetic analysis tools are needed
|
|
KNB Architecture
|
|
To address these issues, we have taken a layered approach to
infrastructure development. The three principal layers are: data
access, information management, and knowledge management.
Data Access: The base layer, data access, addresses
the dispersed nature of data. It consists of a national network of
federated institutions that have agreed to share data and metadata
using a common framework, principally revolving around the use of
the Ecological Metadata Language as a
common language for describing ecological data, and the Metacat metadata server, a
flexible database based on XML and built for storing a wide variety
of metadata documents. In addition, we plan on using the Storage
Resource Broker , a distributed data system developed at SDSC , for linking
the highly distributed set of ecological field stations and
universities housing ecological data. Finally, we are developing a
user-friendly data management tool called Morpho that allows ecologists and
environmental scientists manage their data on their own computers
and access data that are a part of this national network, the KNB.
Information Management: The middle layer, information
management, addresses the heterogeneous nature of ecological data.
It consists of a set of tools that help convert raw data accessible
from the various contributors into information that is relevant to a
particular issue of interest to a scientist. There are two major
components of this information management infrastructure. First, the
Data Integration Engine will provide an intelligent software
environment that assists scientists in determining which data sets
are appropriate for particular uses, and assists them in creating
synthesized data sets. Second, the Quality Assurance Engine
will provide a set of common quality assurance analyses that can be
run automatically using information gathered from the metadata
provided for a data set.
Knowledge Management: The top layer, knowledge
management, addresses the need for high quality analytical tools
that allow scientists to explore and utilize the wealth of data
available from the data and information layers. It consists of a
suite of software applications that generally allow the scientist to
analyze and summarize the data in the KNB. The Hypothesis
Modeling Engine is a data exploration tool that uses Bayesian
techniques to evaluate the wide variety of hypotheses that can be
addressed by a particular set of data. We also plan to provide
various visualization tools that allow scientists to
graphically depict various combinations of data from the data and
information layers in appropriate ways.
|
|
KNB Sites
|
|
A wide variety of organizations and sites have agreed to
participate in the development and testing of the KNB. The LTER Network of
over 24 research stations has agreed to fully participate in the
network, along with a variety of sites from the Organization of
Biological Field Stations (OBFS) and the UC Natural Reserve
System . In addition, we have a variety of individual and site
collaborators from the Multi-Agency Rocky Intertidal Network .
As the technology for the KNB matures, we expect to add many
new sites and sources of data to the network. Sites interested in
participating in the prototype network or in the final deployed
network should contact jones@nceas.ucsb.edu
.
|
|