Artsy has always had a focus on Art meets Science, and we hosted a meet-up in July that really hits on both. We had a collection of Artsy Staff, members of Art + Feminism NYC, the CocoaPods Peer Lab, New York Arts Practicum and volunteers from Wikimedia NYC all helping out.
We came with two aims:
- Help anyone interested in contributing to Wikipedia get started.
- Use The Art Genome Project(TAGP) to improve Wikidata entries for women Artists.
I helped out with the second part, and the rest of this post will be about the lessons learned during this editathon.
What is Wikidata?
Everyone knows Wikipedia, but fewer people know about Wikidata. We learned about it in the process of helping set up this meetup. Wikidata is a structured document store for generic items. The lexicon of keys that can go into a document are handled by community consensus.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
The database is created with the the notion of "semantic triples", which was new to us.
The idea being that each
Item (corresponding to a Q id (
Q463639)) has a bunch of associated
Properties in the form:
subject — predicate — object
<has some relationship to>
In plain English…
Ana Mendietta has a country of citizenship which is United States of America
In essence, a Wikidata
Item is just some structured data around a big bag of triples, like the above.
Artsy + Wikidata
Lucky for this editathon, both Artsy Artist ID, and TAGP ID were already inside the Wikidata controlled vocabulary of
Properties. This mean we could think about how to connect items rather than how we can pitch that is worth connecting them at all.
We used Wikipedia to keep track of all the useful links to share among contributors.
As the majority of us were new to the Wikidata, we scoped our projects to "get something small done." We ended up with three projects on the Wikidata side:
- Edit some wikidata items manually to understand the process.
- Understand QuickStatements in order to do mass-updates of Wikidata items from Artsy data.
- Explore using pywikibot to ensure that updated Artsy details can be kept in sync with Wikidata.
We got some changes to Wikidata. 🎉.
In preparing for this we also generated some data on Artists:
- Artsy Female and Nonbinary Emerging Artists
- Artsy Female and Nonbinary Artists with "Feminist Art" and "Contemporary Feminist" Genes
These were generated back in July, so if you're looking for up-to-date data, we recommend using the Artsy Developer API.
Updating Wikidata with data from Artsy
After spending some time familiarizing ourselves with the process of manually creating and editing Items, we moved onto some basic QuickStatement updates. QuickStatments are a simple text based interface for updating multiple items and properties at once.
We ended up writing what would be the script for a single data item based on hardcoded values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
By the end of the day we were able to enter basic biographical facts from Artsy's CSVs into Wikidata in one fell swoop, by batching up several QuickStatement instructions. In the future, we could write an "Artsy data to QuickStatement" script to handle larger imports.
One of the interesting aspects of looking through the data is that our Artists had a more nuanced set of gender identities than is currently available inside Wikidata's database. We found that we didn't have enough time to address this, but as Wikidata is an on-going project, anyone can add this in the future. If you're looking for a good first foray into Wikidata - this will improve the foundations for everyone.
Using pywikibot to update Wikidata
Most of the work is inside a Jupyter Notebook, which you can get a full preview of on GitHub
We loved the idea of having code showing the incremental process as it's being eval'd. We got the bot to a point where it could edit a Wikidata item based on it data exported from Artsy.
We plan to keep an eye on future efforts to coordinate Wikidata bot development, such as WikidataIntegrator
We discussed what Artsy can do next, we have an idea of how we can connect our data to confirmed data on Wikidata by keeping the Wikidata QID inside our databases too. This means that we can safely keep that up to date.
We would love to do this again, it was exciting to have the project introduced to us - and we really get what they're trying to do. We want to host another, and you should come if you're in NYC!
If you're interested in exploring the Artsy Genome database, we recently updated The Art Genome Project's Genes and Definitions with all of our genes as a CSV under CC-A. We'd love to know if you find any interesting uses.