evince - information managed

Upcoming dates:

evince blog

The duty 12 solution for Families Information Services

Register interest in evince

National Aggregator - Functional Design

 

I thought it would be useful to send through some information about how you will interact with ISPP (National Aggregator).  The points below have been taken and summarised from the ‘Aggregator Functional Design’ document.

 

The ISPP solution is a repository of childcare and family services information, aggregated from Local Authority sources.

The ISPP solution effectively provides three interfaces to system integrators:

  • Aggregation Interface (Submission) - allowing data to be uploaded and stored in the ISPP Repository.
  • Search and Discovery Interface (including Document Retrieval) - allows data uploaded and stored within the ISPP Repository to be searched across and relevant records to be retrieved.
  • Bulk Download - allows data to be harvested from the repository and downloaded in bulk to be cached locally.

 

Data is exchanged with the aggregator software in xml format.

Record Types - there are two sorts of record type:

Enhanced Childcare Directories (ECD)

Childcare information is provided to Local Authorities by Ofsted.  This data is stored in the Local Authority Childcare system, enhanced where applicable and then uploaded to ISPP in the form of an Enhanced Childcare Directory (ECD) record.

Family Services Directories (FSD)

Family Services information is held in multiple formats and in varying locations within Local Authorities.  As part of the requirements laid down by the ISPP project, a Local Authority will store this information in their local Family Information Services (FIS) software.  This will be uploaded to ISPP in the form of a Family Services Directory (FSD) record.

 

1. Data Provider (you)

A Data Provider (i.e. a Local Authority) holds data within the Local Database (i.e. evince back end data - data in table structures) and this is administered by the Local System (i.e. evince front end).  This may be a single system/database covering both childcare and family services information, or this could in reality be two systems covering each data type. 

 

The Local System (evince) is responsible for uploading information in the correct form (be that ECD or FSD) to the aggregator.  Prior to the upload the responsibility is on the Data Provider to ensure that the record is valid with respect to the relevant schema (ECD or FSD) - Local Validation.  The xml data is then uploaded via the Document Submission web service.  There are effectively two methods of upload, via SOAP or via RESTful service.  In effect the Document Submission service comes in these two flavours.

 

2. ISPP

The ISPP solution is an aggregator in the true sense of the word.  It simply collates information that is provided to it and performs no data manipulation. 

 

Data uploaded to the National Aggregator is handled by the Workflow Controller, which performs the following tasks in this order:

 

  • ¯ VALIDATION - the uploaded ECD or FSD record is validated against the relevant schema. If the data is found to be invalid with respect to the schemas then a Validation Error is returned via the Workflow Controller and Document Submission service to the Local System. The upload/submission continues no further. If a file is found to be invalid at this point the data is not indexed or stored! Note that although validation is performed by the aggregator, the responsibility is still on the Local System to have pre-validated the data prior to upload. This step within the aggregator is simply a second level check to ensure that no invalid data is stored.

 

  • ¯ GEOCODING - assuming that the validation is confirmed the data then goes through a processing of Geocoding, where the addressing information held within the data is used to create latitude and longitude coordinates that are stored against the data. A Gazetteer service is used in the resolution of addressing/location data to latitude/longitude.

 

  • ¯ DUPLICATE CHECK - the aggregator must then determine if the data uploaded is a duplicate of existing data all held within the repository. Note that this step occurs after Geocoding so that geographical location can be taken into account when checking for duplicates. Duplicate checking is configurable within the aggregator such that certain data fields can be configured to be inspected as part of the checking. The aggregator never deletes any record uploaded. If the uploaded data is recognized as a duplicate then it is flagged as such and is not indexed in the repository. It is however saved and a message is returned via the Workflow Controller and Document Submission service to the Local System notifying of the duplicate. Note that there is then an offline task required whereby the Data Provider will coordinate with the owner of the authoritative record to confirm the status of each and re-upload where applicable.

 

  • ¯ INDEX - assuming that the data has not been marked as a duplicate (see above) the record is then indexed in the ISPP Index repository. This makes the record available for search via the Search and Discovery Interface.

 

  • ¯ STORE - the data is then stored in the ISPP Document Store. A Submission Success Message is then sent via the Workflow Controller and Document Submission interface to the Local System.

 

Error messages at any stage are routed via the Workflow Controller and Document Submission interface back to the Local System.

 

3. Search and Discovery Consumer

Search criteria are sent from the consumer system (e.g. template web pages, website - Directgov, third sector, Local Authority).  These criteria are formatted into a RESTful search query URL (SRU), and are consumed by the Search and Discovery Interface.  Searching is performed over the ISPP Index repository (SOLR implementation) and Search Results are returned to the consumer system. 

 

Selecting a record (Selected Record) from the search hits makes a call via the Document Retrieval service and the identified record is then returned in the relevant format (ECD or FSD) to the calling system.

 

The Search and Discovery/Document Retrieval Interface returns data in the xml format as prescribed by the data definition schemas.  It is the responsibility of the consuming system to convert this data into xml.  It is via this method that the template web pages display data returned from the aggregator. 

 

Submission Methods

 

Both Childcare (ECD) and Family Services (FSD) records can be submitted to the Aggregator via two separate web services, one utilising SOAP (Simple Object Access Protocol) and the other utilising REST (Representational State Transfer).

 

 

System Responses

There are three main categories of response that are returned from the web services:

 

  • Success Messages

When a resource is successfully validated, it is allocated a unique identifier and entered into the Solr index, allowing the resource to be retrieved via the Search and Discovery interface. A successful upload will return this unique identifier but will not have any associated error messages.

 

A Checksum code is created for every resource that is successfully uploaded. This code is specific to that document and will be repeated if that identical file is uploaded again. In the interests of efficiency, if an identical file is detected by the checksum, no further action will be taken on the new file.  In effect this effectively prevents an exact duplicate unnecessarily being uploaded and does not then unnecessarily implement the duplicate checking algorithm. 

 

  • Validation Exceptions

Schema validation focuses on checking the uploaded XML file against the appropriate ECD or FSD schema.

Any validation exception thrown will prevent the XML resource from being uploaded successfully. For every unsuccessful upload, the XML parser returns an error code and a message that indicate the invalid XML and potential solution

 

  • System Exception

System exceptions are returned in exactly the same XML structure as validation exceptions, but there are a defined set of error codes that refer to the Aggregator system

 

All return messages are in XML format.

 

 

System Error Reporting

 

Exceptions of this nature usually indicate an issue with the configuration of the software rather than with malformed XML.

 

Updating Existing Records

 

The aggregator automatically determines if uploaded records are updates or new based on a checksum.  The following summarises the rules that are employed and how data is recognised by the software:

 

  • Any record submitted for upload on behalf of a data source that is unchanged will not be updated (re-indexed or re-added to the repository).
  • A changed record uploaded on behalf of a data source will be updated within the ISPP repository.  Note that the unique identifier within both data definition schemas is the dc.identifier.  This must be unique for the uploading source (i.e. within the Local Authority) and is not assigned by the aggregator itself, it must be generated at source!
  • A record uploaded on behalf of a data source that matches an existing record and does not have an existing dc.identifier for that source will be recognised as a duplicate rather than an update.  Likewise a record uploaded on behalf of a different source (Local Authority) may also be recognised as a duplicate.

 

Deletion of Records

 

  • Deletion of records within the ISPP system is an administrative task and cannot be initiated via the document submission interface.
  • If there is a requirement to delete a record then a request must be raised via the ISPP Helpdesk and that record will then be removed.
  • The process of removal is based on the removal of the index of the record (so will not be returned in searches); the record itself is not removed from the repository.
  • This gives the provider the means to correct the data and re-upload if required.  Upon upload the record will be re-indexed and again become available via the search and discovery interface.

 

Duplicate Handling

 

The first record of a set of duplicates to be uploaded is the primary record.  The provider that uploaded the primary record is considered the record owner.  Subsequently uploaded records that the aggregator determines to be duplicates of the existing record are flagged as such and notification is then sent back and displayed in this case in the test harness front end.

 

It is important to note that as previously stated the duplicated record although stored in the repository will not be indexed (and therefore will not be available via the Search and Discovery Interface).  In the production environment the receipt of a duplicate notification message from the aggregator should lead into an offline process.  This will involve negotiation between the owner of the duplicate and the owner of the original with a decision on any further action taken jointly.  This offline process may result in the update/removal of the original and the re-upload of the duplicate in order to make it the authoritative version.

 

Note that it is possible to mark a record with the authoritative source flag to indicate that the uploading provider considers the record in question as being authoritative.  This may help in the subsequent offline negotiation.

 

Document Indexing

 

When a resource is successfully validated and it has not been found to be a duplicate of existing data, the resource is indexed. Indexing consists of extracting specific elements of the resource and storing them so that the Search and Discovery module can locate the resource again by searching through the text of the aforementioned elements.

 

For example, if the resource is indexed against DC.Title and DC.Description, any resources that have the search keywords in either DC.Title or DC.Description will be located.

 

As we find out more about interacting with the ISPP and as we start to firm up our systems I will communicate out further details.  I apologise that this email reads very technical in places; I needed to keep a lot of the terms in as this is the language that will be used going forward.

 

Regards

so far

Creative Commons Attribution-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-ShareAlike 2.0 UK: England & Wales