What a journey - I’ve been working on this for a couple of months now and I am proud to present the result. Did you ever got into a position where you might have multiple ideas in analyzing a specific field or domain but just lack the data? It is a tedious process… cleaning, preprocessing, architecture, reliablity, availability, … the list goes on. So I asked myself, wouldn’t it be nice to have a single source of data for my SciSci projects? One sort of datalake to handle everything I would throw at it? Citation analyses, disruption and impact indexing, concept mining and taxnonomy generation. This is when the science datalake project idea was born.
The Science Datalake
tbd