We are using ArcGIS and QGIS software – both being Mr. Good/Dr. Evil depending on the situation. Empirically it turned out to be easier to adjust format and size of dataset to make it agreeable with one of the products.
We’ll look at our hybrid base layer: cadaster, corrected and updated with OSM. It will be our target layer to which we join info from different sources to fill, if possible, the following fields:
- construction date
- style (to be optimistic)
- architect (to be super-optimistic)
- links to Wikipedia and other external sources.
Cadaster data provides us with the construction data information for 129 000 buildings, the rest is to be found. Starting from the Ministry of Culture data, we have a point layer with attributes; joining them to the base layer to get addresses, names and links to the photos. Getting rid of unnecessary information regarding the life and times of the communism leader when it happens to be in view.
Some buildings have historical and cultural monuments connected to them, and trying to delete the extra data points presents a problem. We are training to solve it, however possible, without damaging valuable information.
Now we go on with structuring the Wikimapia data to derive a neat table filled with addresses, names, years, sometimes styles and photos where it’s possible. Tossing filtered points upon the base layer.
By now the table has several fields filled with information from different sources – some objects have their names and addresses filled in different formats three times. In the case of names the choice is not so wide, getting things in order following the simple principle ‘take what you’ve given’. We can prioritize the addresses though: OSM format is the most accurate; it’s a first choice followed by cadaster where the address field is a little clumsy yet it has a broader coverage. The remaining features get their addresses from Wikimapia. For instance, many of the houses in ‘New Moscow’ have their addresses derived from Wikimapia – this input happened to be way more valuable than photos, most of them of such a poor quality that we’ve decided not to use them at all.
Yet it’s still important to get nice photos. By the laws of the adventure genre, at the last minute the great Wikidata
shows up and delivers not so much data in absolute numbers (12 000), but the photos and links of a very good quality. It also has a curious feature: the quality of data is better for less popular objects since well known historical spots and tourist attractions have too many intersections upon them, consequently causing irrelevant information to stick.
So, from these pieces we created a Frankenstein monster – cute and by no means scary.
Distributing all our buildings according to their age, which we discovered for 129 000 buildings: