document vectors for clustering of large document sets

Recently I have finished the work on understanding what exactly is the content of the GEOSS global data infrastructure.  Many partial approaches were shown in previous posts and this one concludes the outcome.  For all the 1.8M metadata records from GEOSS we calculated document vectors and tried to run cluster detection with cosine similarity as … Read moredocument vectors for clustering of large document sets

GEOSS post 1: Semantic spaces of textual metadata content

Sometimes we can discover more information in metadata abstracts than in all other fields, especially when we have so many records as GEOSS can provide. This global data sharing architecture boasting having 300 million metadata records on datasets and services is pretty much operational and delivering data on daily basis. Yet, nobody knows really what … Read moreGEOSS post 1: Semantic spaces of textual metadata content

Computational model discovery in the EU legislation corpus

Imagine you spent eight days training your word embedding on 24 CPU server. Now you have a model and keep dreaming what to do with it. My colleagues, who are cataloguing computational models used in impact assessments and preparatory studies, spend days and weeks discovering these models by reading the relevant documents. This neural network … Read moreComputational model discovery in the EU legislation corpus