- 9:00-9:15 Welcome
- 9:15-9:50 Keynote: Smart Statistics. Data Analytics Services Based on Open Data Platforms, Emanuele Baldacci, Istat
- 9:50-10:30 Session 1 (Long papers)
- Creating and Utilizing Linked Open Statistical Data for the Development of Advanced Analytics Services, Evangelos Kalampokis, Areti Karamanou, Andriy Nikolov, Peter Haase, Richard Cyganiak, Bill Roberts, Paul Hermans, Efthimios Tambouris and Konstantinos Tarabanis
- Early analysis and debugging of linked open data cubes, Enrico Daga, Mathieu D’Aquin, Aldo Gangemi and Enrico Motta
- 10:30-11:00 Coffee Break and Poster Session
- Linked data to support Clinical and Non-Clinical Reporting, Marc Andersen
- A Template for Handling Statistical Data in RDF, Yu Asano, Makoto Iwayama, Hideaki Takeda, Seiji Koide, Fumihiro Kato and Iwao Kobayashi
- Semantic Similarity and Correlation of Linked Statistical Data Analysis, Sarven Capadisli, Albert Meroño-Peñuela, Sören Auer and Reinhard Riedl
- News Fact-checking: One Practical Application of Linked Statistics, Tatiana Tarasova
- 11:00-12:00 Session 2 (Long papers)
- Publishing the 15th Italian Population and Housing Census as Linked Open Data, Raffaella Aracri, Stefano De Francisci, Andrea Pagano, Monica Scannapieco, Laura Tosco and Luca Valentino
- Representing verifiable statistical index computations as linked data, Jose Emilio Labra Gayo, Hania Farhan, Juan Castro Fernández and Jose María Alvarez Rodríguez.
- Publishing Official Classifications in Linked Open Data, Giorgia Lodi, Antonio Maccioni, Monica Scannapieco, Mauro Scanu and Laura Tosco
- 12:00-12:20 Session 3 (Short papers)
- Containment and Complementarity Relationships in Multidimensional Linked Open Data, Marios Meimaris and George Papastefanatos
- From Flat Lists to Taxonomies: Bottom-up Concept Scheme Generation in Linked Statistical Data, Albert Meroño-Peñuela, Ashkan Ashkpour and Christophe Guéret
- 12:20-12:45 Session 4 (Challenge papers)
- Multiscale Exploration of Spatial Statistical Datasets: A Linked Data Mashup Approach, Ba-Lam Do, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, Amin Anjomshoaa, Amin Tjoa
- Geo-statistical Exploration of Milano Datasets, Irene Celino, Gloria Re Calegari
- Announcement of the Challenge Winner and Final Remarks
SemStats 2014 Call for Challenge
Summary
The SemStats Challenge is back with more action! It is organized in the context of the SemStats 2014 workshop. Participants are invited to apply statistical techniques and semantic web technologies within one of two possible tracks, namely the Census Data Track and Open Track. Following up on the success of last year’s Challenge, this year, the Census Data Track will have data from France, Italy, and Ireland. We would also like to introduce the new Open Track, where any type of statistical data of your choice may be used in the challenge.
The challenge will consist in the realization of mashups or visualizations, but also on comparisons, analytics, alignment and enrichment of the data and concepts involved in statistical data (see below for the data made available and additional requirements).
The deadline for participants to submit their short papers and application is Sun 7thTue 30 September, 2014, 23:59pm Hawai Time. Submission is done via EasyChair by selecting the Challenge paper category.
It is strongly suggested to all challenge participants to send contact informations to semstats2014@easychair.org in order to be kept informed in case of any changes in the data provided. For any questions on the challenge, please contact semstats2014@easychair.org.
Census Data Track
We would like to point you to plenty of raw data. The conversion process will be considered as part of the challenge.
- Istat (Italian National Institute of Statistics) offers Census 1991, 2001, 2011 data and metadata: http://www.istat.it/it/archivio/104317#variabili_censuarie (See “Variabili censuarie / Censimento della popolazione e delle abitazioni”), which gives the population count by age range and sex at a very detailed geographic level.
- INSEE (National Institute of Statistics and Economic Studies) can provide different things:
- Detailed results for Census 2011: http://insee.fr/fr/themes/detail.asp?reg_id=0&ref_id=fd-RP2011&page=fichiers_detail/RP2011/telechargement.htm giving results on individuals only at the region level but with a great number of other variables (see http://insee.fr/fr/ppp/bases-de-donnees/fichiers_detail/RP2011/doc/contenu_RP2011_INDREG.pdf)
- Detailed results for Census 2010: http://insee.fr/fr/themes/detail.asp?reg_id=0&ref_id=fd-RP2010&page=fichiers_detail/RP2010/telechargement.htm with, for example, results on individuals at a smaller geographic level
- Key figures for Census 2011 on different themes at the municipality level: http://insee.fr/fr/bases-de-donnees/default.asp?page=recensement/resultats/2011/donnees-detaillees-recensement-2011.htm
- ABS (Australian Bureau of Statistics) offers Census 2011 data at http://stat.abs.gov.au/ . Data that is in particularly of interest to this challenge can be found by navigating to: Social Statistics > 2011 Census of Population and Housing > Time Series Profiles (Local Government Areas) > T03 Age by Sex (LGA)
- CSO (Central Statistics Office) Ireland’s Census 2011 data and metata available as Linked Data: http://data.cso.ie/
- You are welcome to use any other Census data whether it is Linked Data based or not
Open Track
There is one essential requirement for the Open Track: papers must describe a publicly available application. We would love to see everyone play and learn from what you have created. You are welcome to use any statistical data whether it is already in Linked Data shape or not! While you are at it, why not combine it with data from other domains?
Here are some statistical linked dataspaces (off the top of our heads):
SemStats 2014 Call for Papers
Workshop Summary
The goal of this workshop is to explore and strengthen the relationship between the Semantic Web and statistical communities, to provide better access to the data held by statistical offices. It will focus on ways in which statisticians can use Semantic Web technologies and standards in order to formalize, publish, document and link their data and metadata. It follows the 1st Semantic Statistics workshop held at ISWC 2013 (SemStats 2013) that was a big success attracting more than 50 participants all along the day.
The statistical community shows more and more interest in the Semantic Web. In particular, initiatives have been launched to develop semantic vocabularies representing statistical classifications and discovery metadata. Tools are also being created by statistical organizations to support the publication of dimensional data conforming to the Data Cube W3C Recommendation. But statisticians see challenges in the Semantic Web: how can data and concepts be linked in a statistically rigorous fashion? How can we avoid fuzzy semantics leading to wrong analyses? How can we preserve data confidentiality?
The workshop will also cover the question of how to apply statistical methods or treatments to linked data, and how to develop new methods and tools for this purpose. Except for visualisation techniques and tools, this question is relatively unexplored, but the subject will obviously grow in importance in the near future.
Motivation
There is a growing interest regarding linked data and the Semantic Web in the statistical community. A large amount of statistical data from international and national agencies has already been published on the web of data, for example Census data from the U.S., Spain or France amongst others. In most cases, though, this publication is done by people exterior to the statistical office (see also http://datahub.io/dataset/istat-immigration, http://270a.info/ or http://eurostat.linked-statistics.org/), which raises issues such as long-term URI persistence, institutional commitment and data maintenance.
Statistical organisations are also interested in how Semantic Web might make it simpler for analysts to use well described statistical data in conjunction with other forms of data (eg geospatial information, scientific data, “big data” from various sources) which is expressed semantically. The ability to bring together diverse types of data in this way should enable new insights on multifaceted issues.
Statistical organizations also possess an important corpus of structural metadata such as concept schemes, thesauri, code lists and classifications. Some of those are already available as linked data, generally in SKOS format (e.g. FAO’s Agrovoc or UN’s COFOG). Semantic web standards useful for the statisticians have now arrived at maturity. The best examples are the W3C Data Cube, DCAT and ADMS vocabularies. The statistical community is also working on the definition of more specialized vocabularies, especially under the umbrella of the DDI Alliance. For example, XKOS extends SKOS for the representation of statistical classifications, and Disco defines a vocabulary for data documentation and discovery. The Visual Analytics Vocabulary is a first step towards semantic descriptions for user interface components developed to visualize Linked Statistical Data which can lead to increased linked data consumption and accessibility. We are now at the tipping point where the statistical and the Semantic Web communities have to formally exchange in order to share experiences and tools and think ahead regarding the upcoming challenges.
Statisticians have a long-going culture of data integrity, quality and documentation. They have developed industrialized data production and publication processes, and they care about data confidentiality and more generally how data can be used.
The web of data will benefit in getting rich data published by professional and trustworthy data providers. It is also important that metadata maintained by statistical offices like concept schemes of economic or societal terms, statistical classifications, well-known codes, etc., are available as linked data, because they are of good quality, well-maintained, and they constitute a corpus to which a lot of other data can refer to.
It seems that after a period where the aim was to publish as many triples as possible, the focus of the Semantic Web community is now shifting to having a better quality of data and metadata, more coherent vocabularies (see the LOV initiative), good and documented naming patterns, etc. This workshop aims to contribute in these longer term problems in order to have a significant impact.
The statistics community faces sometimes challenges when trying to adopt Semantic Web technologies, in particular:
- difficulty to create and publish linked data: this can be alleviated by providing methods, tools, lessons learned and best practices, by publicizing successful examples and by providing support.
- difficulty to see the purpose of publishing linked data: we must develop end-user tools leveraging statistical linked data, provide convincing examples of real use in applications or mashups, so that the end-user value of statistical linked data and metadata appears more clearly.
- difficulty to use external linked data in their daily activity: it is important to develop statistical methods and tools especially tailored for linked data, so that statisticians can get accustomed to using them and get convinced of their specific utility.
To conclude, statisticians know how misleading it can be to exploit semantic connections without carefully considering and weighing information about the quality of these connections, the validity of inferences, etc. A challenge for them is to determine, to ensure and to inform consumers about the quality of semantic connections which may be used to support analysis in some circumstances but not others. The workshop will enable participants to discuss these very important issues.
Topics
The workshop will address topics related to statistics and linked data. This includes but is not limited to:
How to publish linked statistics?
- What are the relevant vocabularies for the publication of statistical data?
- What are the relevant vocabularies for the publication of statistical metadata (code lists and classifications, descriptive metadata, provenance and quality information, etc.)?
- What are the existing tools? Can the usual statistical software packages (e.g. R, SAS, Stata) do the job?
- How do we include linked data production and publication in the data lifecycle?
- How do we establish, document and share best practices?
How to use linked data for statistics?
- Where and how can we find statistics data: data catalogues, dataset descriptions, data discovery?
- How do we assess data quality (collection methodology, traceability, etc.)?
- How can we perform data reconciliation, ontology matching and instance matching with statistical data?
- How can we apply statistical processes on linked data: data analysis, descriptive statistics, estimation, correction?
- How to intuitively represent statistical linked data: visual analytics, results of data mining?
Submissions
This workshop is aimed at an interdisciplinary audience of researchers and practitioners involved or interested in Statistics and the Semantic Web. All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted paper is expected to attend the workshop.
Workshop participation is available to ISWC 2014 attendants at an additional cost, see http://iswc2014.semanticweb.org/registration for details.
The workshop will also feature a challenge based on Census Data published on the web or provided by Statistical Institutes. It is expected that data from Australia, France and Italy will be available. The challenge will consist in the realization of mashups or visualizations, but also on comparisons, alignment and enrichment of the data and concepts involved.
We welcome the following types of contributions:
- Full research papers (up to 12 pages)
- Short papers (up to 6 pages)
- Challenge papers (up to 6 pages)
All submissions must be written in English and must be formatted according to the information for LNCS Authors (see http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0). Please, note that (X)HTML(+RDFa) submissions are also welcome as long as the layout complies with the LNCS style. Authors can for example use the template provided at https://github.com/csarven/linked-research. Submissions are NOT anonymous. Please submit your contributions electronically in PDF format at http://www.easychair.org/conferences/?conf=semstats2014 and before July 721, 2014, 23:59 PM Hawaii Time. All accepted papers will be archived in an electronic proceedings published by CEUR-WS.org.
See important dates and contact info on the workshop home page.
If you are interested in submitting a paper but would like more preliminary information, please contact semstats2014@easychair.org.
Chairs
Sarven Capadisli, Bern University of Applied Sciences, Switzerland, and University of Bonn, Germany
Franck Cotton, INSEE, France
Armin Haller, CSIRO, Australia
Alistair Hamilton, ABS, Australia
Monica Scannapieco, IStat, Italy
Raphaël Troncy, EURECOM, France
Program Committee
Phil Archer, W3C, UK
Ghislain Auguste Atemezing, Eurecom, France
Jay Devlin, Statistics New Zealand, New Zealand
Miguel Expósito Martín, Instituto Cántabro de Estadística, Spain
Dan Gillman, US Bureau of Labor Statistics, USA
Arofan Gregory, Metadata Technology NA, USA
Tudor Groza, School of ITEE, The University of Queensland, Australia
Christophe Guéret, Data Archiving and Networked Services (DANS), The Netherlands
Andreas Harth, AIFB, Karlsruhe Institute of Technology, Germany
Hak Lae Kim, Samsung Electronics
Laurent Lefort, CSIRO ICT Centre, Australia
Domenico Lembo, Sapienza University of Rome, Italy
Vincenzo Patruno, IStat, Italy
Marco Pellegrino, Eurostat, Luxembourg
Dave Reynolds, Epimorphics, UK
Hideaki Takeda, National Institute of Informatics, Japan
Wendy Thomas, Minnesota Population Center, USA
Bernard Vatant, Mondeca, France
Boris Villazón-Terrazas, iSOCO, Intelligent Software Components, Spain
Joachim Wackerow, GESIS – Leibniz Institute for the Social Sciences, Germany
Stuart Williams, Epimorphics, UK
Workshop preparation launched
After the big success of SemStats 2013 last October in Sydney, we look forward to a second edition in Riva del Garda, in colocation with ISWC 2014. We are currently finalizing the workshop proposal.
More information soon on this page.