It is well-known that the users of Photon and Neutron sources move between facilities and instruments, using and combining different techniques, together with simulation data, to further their science (see for example: http://pan-data.eu/Users2012e-Results ). They thus collect data in different formats, with different sets of experimental parameters and with different data set conventions, depending on the local practices of the facility and instrument. If users wish to search, access, combine and compare, and apply analysis software to data collected from different sources, they will need to take these differences into account, which could be a significant barrier to the effective use of the data. Further, software developers have to deal with different formats and parameters, and future re-users of data would need to search and access data in different systems.
A key issue is that the size of data sets already exceeds the capacity of facility users to handle them, so it is becoming essential that the facilities provide resources that allow such data to be rapidly stored and managed. However, it is not sufficient just to provide a repository for large data sets, since it may be impractical for users to download them. Rather, it is necessary that such data servers can return specific subsets of the data, such as data slabs or associated metadata, as requested by the user. Such granular access to the data presents a number of technical challenges that this workshop is designed to address.
In this workshop, we will explore how to support and enable the interoperability of data within and across user facilities. This would include considering the following aspects:
- Data Discovery. How do we integrate existing online resources, such as ICAT, the Materials Data Facility, and Globus Online, so that the users are able to locate data to which they have access, no matter where it is stored? This would consider the use of common data and metadata standards and protocols for discovery, access and interpretation of data sets.
- Authentication. How do we provide a unified authentication scheme that determines whether the users have access to the remote data? This would consider common schemes to identify users and how to accommodate different local authentication schemes.
- Server Queries. What queries are required so that users can inspect the remote data and selectively download subsets of the data and metadata?
- Transfer Protocols. How do we handle network requests and how do we serialize the data that is returned?
- APIs. What kind of APIs are needed/provided? How would applications need to be modified?
- Reproducibility. Do we need to capture and store queries to make a) processes reproducible and/or b) facilitate actions repeatedly accessing the same data slab?
- Controlled Vocabulary. Do we need controlled vocabularies for data and/or resource discovery?
The results will be disseminated to facilities around the world for comment, with a view to providing international standards for accessing data through cloud servers. An open-source project would be established to implement such servers, although facilities would be free to implement their own to accommodate local variations in server and data architecture provided they respond to the agreed network queries and transfer protocols.
The aim of the workshop would be to identify and refine likely areas of work which could form a programme of work for a future RDA Working Group, within the scope of the RDA Interest Group on the Research Data Needs of Photon and Neutron Science. We would expect that the outputs of the workshop include:
A workshop report, outlining a draft specification and a development roadmap to support a interoperable infrastructure across facilities, for consultation with facilities.
Input into a draft RDA Working Group proposal describing a specific set of time limited activities within the roadmap, such as specifying a reference architecture and core interchange protocols
Identifying potential implementations to provide a proof of concept of the working group recommendations.
- Matthews, Brian
- Osborn, Raymond
- Schlulenzen, Frank
- Fernández Carreiras, David