Dashboard > CI Development > ... > Semantic Framework Integration > Work Descriptions for Semantic Prototype
Log In   View a printable version of the current page.
CI Development
Work Descriptions for Semantic Prototype
Added by John Graybeal , last edited by Michael Meisinger on Sep 23, 2009  (view change)
Labels: 
(None)

Introduction

The Semantic Prototype for the OOI Cyberinfrastructure will demonstrate a number of end-to-end operations and capabilities that have been semantically enabled for OOI. This means that the meaning of the concepts used in these operations will be captured in a systematic way that allows OOI systems to use the concepts for communication.

The Semantic Prototype will seek to apply semantic understanding to a few topic areas that will be illustrative, accessible, and risk-reducing. To be illustrative, the concepts must be understandable to the average user, and complex enough to show something interesting happening. To be accessible, the concepts must already exist (in some form) in the data sets being examined, and must have a simple enough scope that they can be sufficiently resolved in the prototype. To reduce risk for OOI, the semantic work must address tasks that are necessary for the OOI, and are either complex, poorly understood, or with many alternative approaches.

Objectives

The selected semantic prototype will examine an existing collection of data sets, which are already well structured and metadata-enabled, identifying semantic content that can be exploited and improved for OOI application. Once suitable semantic content is identified, further semantic development will be performed to create reference vocabularies, incorporate them into repositories, and use them to enable semantic solutions with the original data sets.

The semantically enabled end user solutions will include search capabilities, automatic semantic indexing of data sets, workflows for modifying existing reference vocabularies, evaluation and tagging of data sets, and validation of existing content for conformity with specifications.

Technology investigations/implementations will include semantic wiki evaluation, ontology repository integration and enhancement, reference vocabulary research, and ebRIM catalog solutions. Also, the semantic technologies of the Virtual Solar Terrestrial Observatory will be instantiated with and applied to oceanographic data sets.
Table of Tasks, Deliverables, and Resources

The following tasks are assigned to the principals led by John Graybeal, Ilya Zaslavsky, and Peter Fox. John Graybeal will serve as overall coordinator, and each of those participants will coordinate their own team's contribution.

The entire set of tasks are to be done by November 30, 2009, with an interim progress report due September 30, 2009.

Tasks are in approximately chronological order, except as noted in text; sequences may be changed by mutual consent.

Table of Tasks

Task # Description Deliverables Graybeal Group Resources Zaslavsky Group Resources Fox Group Resources External Domain Resources
1 Evaluate metadata available from CF-compliant DAP data Support tool; list of identified and chosen data sets 1 mw Lead, 2 mw Tech      
2 Identify fields most suitable for semantic mapping List of fields 1 mw Sr Tech,0.5 mw Lead 1 mw Sr Tech    
3A Search for suitable core concepts vocabulary for term mapping List of identified and chosen vocabularies 2 mw Tech, 1 mw Sr Tech, 1 mw Research Asst 0.5 mw Lead, 1 mw Sr Tech    
3B Create sufficiently complete core vocabulary for node mapping, perform mappings from individual terms to core vocabulary (needs: 7A, 7B) Core vocabulary for each field; set of mappings for each 2 mw Lead, 2 mw Tech, 2 mw Sr Tech 0.5 mw Lead, 2 mw Sr Tech    
3C Align concepts with overarching vocabulary Examples as described 2 mw Tech 0.5 mw Lead    
3D Advise on ontology development/mapping concepts       $  5K  
3E advise team on domain-specific semantic issues         $ 10K
4A Define requirements for repository MMI Requirements list 0.5 mw Lead, 0.5 mw Sr Tech 0.5 mw Lead, 0.5 mw Sr Tech    
4B Evaluate semantic wikis; select and instantiate one 1 page summary   2.5 mw Sr Tech    
4C Persist vocabularies Repository capability 1.5 mw Research Asst 1 mw Sr Tech    
4D Support term mappings in repository Repository capability 1 mw Sr Tech 1 mw Sr Tech    
4E Support semantic inferencing in repository(ies?) Repository capability 2.5 mw Sr Tech 1 mw Sr Tech    
4F Promote inter-repository collaboration/alignment   0.5 mw Lead, 0.5 mw Sr Tech, 0.5 mw Research Asst 0.5 mw Lead in kind  
5A Evaluate potential inferencing relationships in chosen vocabularies List of identified relations between fields 0.5 mw Lead   $20K  
5B Evaluate VSTO application options and instantiate for OOI Installed VSTO application with OOI concepts. 0.5 mw Lead   $  5K  
5C Create simple business rules/validation based on semantic content In-place infrastructure for evaluating business rules. 0.5 mw Lead   $20K $  5K
6A Create faceted search construct based on available semantics Operational faceted search interface. (see 4D)   $20K  
6B Create ebRIM interface (services interface, no user GUI) Populated ebRIM catalog.   3 mw Sr Tech    
7 Implement dataset registration crawler Operating dataset crawler. (see 1) 9 mw Sr Tech    
8 Implement dataset tagger and tag datasets Executable dataset tagger. 3 mw Tech, 1 mw Sr Tech, 1 mw Research Asst      
9 Monitor and support community ontology efforts   1 mw Lead, 1 mw Tech   $  5K  
10A Overall coordination of work   4.5 mw Lead      
10B Coordination of team's tasks   1.5 mw Lead, 1 mw Research Asst 2 mw Lead 2 mw Lead (included)  
10C Create design and other documents in support of December review. Content input to OOI package. 1 mw Lead, 1 mw Tech, 1 mw Research Asst 1 mw Lead    
TOTAL     13.5 mw Lead, 9.5 mw Sr Tech, 13 mw Tech, 6 mw Research Asst  6 mw Lead 22 mw Sr Tech $75,000

Descriptions of Tasks

1. Evaluate metadata available from CF-compliant DAP data

This is meant to be a manual review---it's a necessary scoping step to cherry pick things we can work with and see how good or bad existing data sets are. We will be looking for metadata that is available in each field, and is semantically actionable. (Note that a set of guidance can also be developed, at least in our heads, as a result of this review.)
In order to perform this task, it will be necessary to write software to capture all the fields and terms that are present in each data set, and align them for review. The tool should include the ability to select whether this information is displayed for each data set. (Suggested sites to examine for well-behaved data: OceanSITES; USGS.)

2. Identify fields most suitable for semantic naming and mapping

(Suggested suitability priority: #1: variables; #2: units; #3: data set structure).
Other possible fields: contract; organization; OOI role; simple date concepts, though this is icky (e.g., startYear, startMonth, endYear, endMonth); location by name; location by, say, C-squares; sensor/sensor type; access/use licenses.
We can't enforce retroactive changes to whatever data set providers did, so we will only be able to adopt a single vocabulary if they did. Even then, think mapping to other vocabularies will be useful (e.g., from CF to our vocabularies of interest), as described under 3.

3. For each of up to the top 1 to 3 identified fields, do following steps:

More fields may be identified and processed if time allows.
The assumed treatment scenario is the same in each case. It is not expected that any of these fields will be flawlessly consistent controlled vocabularies.

A. search for suitable core concepts vocabulary for term mapping (e.g., CF standard names; netCDF structure types)
We want to list the vocabularies that are already available that may be useful. For netCDF structure types, there may be some relevant ontologies here:
http://wiki.esipfed.org/index.php/DataTypeOntologies
Units may be addressed by one of the UoM ontologies referenced by the Ontolog work.
Parameters are likely to be addressed best by the CF standard names, possibly in combination with another vocabulary like GCMD. The absence of a standard model for characterizing parameters may be an obstacle in this case.

B. create sufficiently complete core vocabulary for node mapping, perform mappings from individual terms to core vocabulary
Create sufficiently complete core vocabulary for node mapping, perform mappings from individual terms to core vocabulary. If a concept vocabulary is found, this is the basis for the mapping concept vocabulary. It will likely be incomplete; as concepts are found that can not be mapped, extend the core mapping vocabulary as needed to make it complete.
If no concept vocabulary is found, we will have to build one. This should include the range of values taken on by this field. Only one term should be included for each concept in the original, though no refinement of terms should be made (in order to be fast).
To validate that all the source concepts have been addressed, map each concept from the data sets in the given field to the core vocabulary.
As part of this process, determine a good practice for how to extend a vocabulary to include new terms.

C. align core vocabulary with one or more overarching vocabularies
In each core reference vocabulary, align terms where possible to at least one external ontology (e.g., SUMO) or thesauri (WordNet).
For larger vocabularies, just align a small subset of the terms.
If one of the chosen fields is a parameter, align some of these terms to the IOOS parameter vocabulary.

D. advise team on ontology/vocabulary/mapping issues
To the extent ontological issues arise in the above processes, advise the team on best approaches to proceed.

E. advise team on domain-specific semantic issues
Provide insight/review on the appropriate semantic mappings for the identified terms. (Needs to be someone familiar with the data sets being used.)

4. Repository Tasks

A. Capture needs for semantic framework and repository
Starting with requirements at Ontology, MMI, and NeOn (and BioPortal if available), prioritize the most important requirements for this project and for OOI CI. Update an MMI requirements document with the result.
(The MMI Ontology Registry is the de facto ontology repository to be used in this demonstration; a semantic wiki solution will also be chosen for comparison, see next item.)

B. Evaluate semantic wiki solutions against those needs. instantiate the selected wiki for the prototype.
In the semantic wiki system, users navigate concepts and can edit them in wiki.
Evaluate RPI semantic wiki (Fox, McGuinness) as one of the options for this prototype.
Instantiate the selected wiki in a location suitable for use with the prototype.
A possibly relevant requirements note: When versioned URIs are needed, at that point you should be curating the results (in particular, preserving those URIs into perpetuity); before that point, curation is only needed for social reasons, the versioned URIs are not as central.

C. Persist vocabularies
Be able to persist the created core mapping vocabularies (whether created by extension or from raw concepts) in the repository. (Vocabularies may be persisted in either the MMI ontology registry, or a semantic wiki; at least one core vocabulary should be in each type of repository, for comparison purposes.)

D. Support term mappings in repositories
Be able to create a mapping from one term to another, either within or across vocabularies. The mapping should be exposable as an RDF triple that can be consumed by other semantic tools.

E. Support semantic inferencing in repository
Use SPARQL to identify equivalent and narrower terms in response to an external query.
Needs to be turned on in the MMI repository. Some amount of validation and verification will be needed. (Not clear if/how possible with semantic wiki, so this task is not defined for the semantic wiki items at this time.)

F. Promote inter-repository semantic collaboration/alignment
This continues MMI's initial attempts in this direction. Some coordination with Open Ontology Repository's current repository discussions will be needed.

5. Evaluate semantic leveraging options

A. Evaluate potential inferencing relationships in chose vocabularies.
In particular, determine whether any fields can be used to infer the value of other fields/facets.
This evaluation should assess the semantic leveraging that is available for the VSTO or other software can use. It includes evaluating the set of concepts, and report (a) whether they are the best concepts to use from the original data files, and (b) what relationships exist between them.

B. Evaluate VSTO application options and instantiate for OOI.
Determine what analogs to the capabilities of VSTO can be created using the OOI concepts. For example, if we find the structure of the data set has a good correlation to the variables contained within it, we could use that with the VSTO system. Even if no useful relationships among concepts are identified, set up the VSTO system to reflect the fields we do have, so that additional fields can be leveraged later.

C. Create simple business rules/validation that based on semantic content.
Identify and install needed software to enable semantic validation of the data. Set up basic business rules based on semantic fields. Communicate method for doing so to other team members.

6. Select most suitable front end(s) to the Data Sets Catalog out of these options

A. Create faceted search construct based on available semantics
Use the selected fields to create a faceted search construct. This may not be a particularly interesting exercise, depending on the vocabularies and semantic relationships that are found. But it is a necessary step toward a more interesting semantic capability, and we need to see this operational and be able to modify it.
The front end should use the repository (cached vocabularies and inferencing) to achieve its functionality; therefore, this task requires Task 7.

B. Create ebRIM interface to provide services interface (no user GUI)
This migrates an ebRIM catalog technology, an implementation of a widely accepted standard, into this domain. No user interfaces will be developed/enhanced as part of this adoption, but the service requests provided by ebRIM should be available.

7. Implement dataset registration crawler to add new datasets (instances and 'classes') as they become available.

The goal is to make these tasks as efficient as possible.
A. report on missing metadata (of the fields we are pursuing)
B. compare term and other metadata value instances to existing; update caches/indices
C. flag new terms for classification/mapping
D. create process for reviewing/adding terms and mappings to appropriate vocabularies
E. Capture needs for semantic framework and repository, and evaluate selected solutions against those needs

8. Provide a "data set tagger" capability

This can tag the CF data sources with our controlled vocabularies, or using the linked data model, create a separate RDF annotation that says "tag Boof appliesTo dataURI ..."). The first approach will require a new netCDF element; the second requires dataURIs a la Linked Data. A decision will have to be made as to how to capture the tags that are applied to data sets (attempt to modify the data sets, or maintain independently).

9. Monitor and support ontology efforts.

In particular, units of measure (Ontolog), Device Ontologies (MMI and W3C), and Data Type ontologies (ESIP Federation) will all be applicable to OOI CI, and all are in formative stages.

10. Coordinate Work

A. Overall coordination of work
John Graybeal will coordinate across the 3 activities, and work with Michael Meisinger and Matt Arrott of UCSD as needed. This coordination will include at least one site visit per month (travel paid by contractor), attendance at the OOI CI kickoff meeting (Sep 9-12), and training at Unidata on THREDDS, TDS, and associated topics (Aug 2-6).
B. Coordination of team tasks
Each team lead will coordinate their team's activities, respond to coordination and support as requested by John Graybeal, and overall contract management by Matt Arrott.
C. Create design and other documents for December review
Create design documents and analyses, reflecting the work above, suitable for inclusion in the December OOI review.

Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators