content detection from keywords
There are 198.000 unique keywords in GEOSS infrastructure metadata. I could not resist to connect them based on collocation in individual records. And here is the outcome with community detection:
Briar Hamsfest Consulting s.r.o.
We provide detailed view from afar.
There are 198.000 unique keywords in GEOSS infrastructure metadata. I could not resist to connect them based on collocation in individual records. And here is the outcome with community detection:
Recently I have finished the work on understanding what exactly is the content of the GEOSS global data infrastructure. Many partial approaches were shown in previous posts and this one concludes the outcome. For all the 1.8M metadata records from GEOSS we calculated document vectors and tried to run cluster detection with cosine similarity as … Read moredocument vectors for clustering of large document sets
This time we linked institutions through keywords. No need to go any further, this is what we got:
What if the patterns of institution network, where metadata cite all contributing institutions, can actually lead us to knowledge of what the GEOSS actually contains? Therefore we created a graph that included institutions as well as individual authors. The optimised network layout was hard to read first but we observed several import patterns. The resulting … Read moreGEOSS 3: content discovery through inter-institutional network patterns
When you have millions of datasets from ten thousands data providers, you may wonder a bit what is the overall picture after all. Below you can find two pictures describing the whole architecture using just keywords and keyword co-location. The first picture shows how the keywords cluster and what difficulties we have to discover any … Read moreGEOSS 2: Bird’s eye perspective of the whole GEOSS content
Sometimes we can discover more information in metadata abstracts than in all other fields, especially when we have so many records as GEOSS can provide. This global data sharing architecture boasting having 300 million metadata records on datasets and services is pretty much operational and delivering data on daily basis. Yet, nobody knows really what … Read moreGEOSS post 1: Semantic spaces of textual metadata content
Imagine you spent eight days training your word embedding on 24 CPU server. Now you have a model and keep dreaming what to do with it. My colleagues, who are cataloguing computational models used in impact assessments and preparatory studies, spend days and weeks discovering these models by reading the relevant documents. This neural network … Read moreComputational model discovery in the EU legislation corpus
What can we discover if we teach word embedding over the whole corpus of European legislation and supportive documents? This is what happens when you run similarity queries five fold and domains you get:
We have a vast infrastructure of data and services on our fingertips yet we need to now more about it as whole. One of the interesting tasks can be visualisation of how actually updated is the whole network. So we jumped on our favourite python/Gephi workflow and produced the following image.
Policy advice is a notoriously difficult task. Those who do the decision support analytics usually have like negative deadlines and there is never enough time to search for data, you just take what you have. INSPIRE Directive attempted to provide this data fast and fast does not mean only the download times but search-to-download times. … Read moreUnravelling the INSPIRE network