Hi. I've been a wikipedist for many years. I'm a moderator of two Wikipedias. One of them is a little Wikipedia and this makes me naturally interested in such cross-wiki comparison.
Currently Wikis are mainly assessed by the depth defined as
This metrics poses a lot of problems:
- It doesn't make much sense and lacks interpretation.
- It produces weird results like English Wiki being tens or even hundreds times "better" than other big wikis, including those in large languages like French and German.
- It experiences inflation for little wikis.
- It's easy to manipulate by either number of edits or number of special pages of both and it's apparent that some wikis do exactly this.
- This metrics makes large and little wikis actually incomparabe.
Second and third term in this equation do exactly the same thing - measure contribution of special pages (discuss pages, user pages, categories etc.). Their multiplication introduces a large dependence on the number of special pages and causes giant inflation for little wikis.
I experimented with various metrics and found that the easiest and best is
So third term is the same, second is removed, and first one is altered.
is a function of the type. It grows slower and slower and has an asymptote at 1.
I tested this metrics on various Wikis (biggest Wikis, Wikis in conlangs + some interesting ones) and here are the results:
| Wikipedia | Articles | Words | Pages | Words/Articles | Depth | Depth* |
|---|---|---|---|---|---|---|
| English | 7085151 | 5029491210 | 64391921 | 709.86 | 1336 | 1336 |
| Cebuano | 6115898 | 1326291436 | 11230117 | 216.86 | 2 | 209 |
| German | 3065898 | 1691653886 | 8411434 | 551.76 | 93 | 742 |
| French | 2718357 | 1835692395 | 13711543 | 675.29 | 274 | 1145 |
| Swedish | 2619342 | 489510386 | 6339381 | 186.88 | 18 | 232 |
| Dutch | 2201228 | 545554446 | 4737644 | 247.84 | 19 | 281 |
| Spanish | 2071945 | 1314624852 | 8522690 | 634.49 | 193 | 1016 |
| Russian | 2069963 | 1158572748 | 8412457 | 559.71 | 166 | 892 |
| Italian | 1942793 | 1061416495 | 8440484 | 546.34 | 195 | 889 |
| Polish | 1673719 | 540889052 | 3948992 | 323.17 | 36 | 394 |
| Ukrainian | 1396184 | 551840693 | 5004148 | 395.25 | 61 | 603 |
| Vietnamese | 1296630 | 360188264 | 14609301 | 277.79 | 536 | 535 |
| Portuguese | 1159159 | 623934501 | 5998524 | 538.26 | 206 | 918 |
| Catalan | 783388 | 407411496 | 1965859 | 520.06 | 42 | 662 |
| Finnish | 606606 | 180505998 | 1559044 | 297.57 | 37 | 384 |
| Czech | 579646 | 282045339 | 1616471 | 486.58 | 50 | 660 |
| Hungarian | 562443 | 243447405 | 1599186 | 432.84 | 60 | 593 |
| Serbo-Croatian | 461167 | 121817798 | 4626946 | 264.15 | 749 | 503 |
| Esperanto | 377333 | 92540340 | 846921 | 245.25 | 16 | 288 |
| Lithuanian | 223935 | 50390882 | 558880 | 225.02 | 30 | 285 |
| Latin | 140669 | 21762451 | 290730 | 154.71 | 15 | 169 |
| Ido | 59986 | 7872035 | 84997 | 131.23 | 2 | 82 |
| VolapĂĽk | 45855 | 3295145 | 163486 | 71.86 | 133 | 109 |
| Scots | 34282 | 7730427 | 138167 | 225.50 | 59 | 359 |
| Interlingua | 30146 | 3329674 | 45762 | 110.45 | 4 | 80 |
| Kotava | 29896 | 2748601 | 36342 | 91.94 | 0 | 34 |
| Interlingue | 13358 | 4024721 | 17638 | 301.30 | 0 | 155 |
| Sardinian | 7728 | 2366856 | 17522 | 306.27 | 17 | 362 |
| Kashubian | 5495 | 443007 | 8892 | 80.62 | 8 | 65 |
| Lingua Franca Nova | 4490 | 1603991 | 7185 | 357.24 | 2 | 283 |
| Pennsylvania German | 2039 | 154774 | 6042 | 75.91 | 68 | 106 |
| Novial | 1877 | 155531 | 4812 | 82.86 | 91 | 107 |
| Tetum | 1380 | 269417 | 3952 | 195.23 | 61 | 269 |
| Lojban | 1348 | 464157 | 5816 | 344.33 | 214 | 559 |
| Gothic | 976 | 99310 | 3946 | 101.75 | 113 | 162 |
| Dinka | 338 | 90704 | 1126 | 268.36 | 43 | 397 |
| Cree | 14 | 2272 | 2342 | 162.29 | 483867 | 341 |
My findings/remarks are as follows:
- I scaled the results so that English is the same to make it easier to compare.
- This metrics doesn't produce such weird results like the previous ones.
- It's inflation free (look at Cree).
- It cannot be manipulated by number of edits because doesn't use them.
- It's very hard to manipulate by the number of special pages (look at Serbo-Croatian and Vietnamese).
- Cheating with the number of special pages has a natural limit which is words/articles.
- This metrics makes it possible to compare all wikis - large and little.
- Maybe it can be manipulated by altered script to count words - admins need to say.
- This metrics is easy in form and has clear interpretation: length and/or quality of articles times contribution of community and/or quality or articles.
I proposed this metrics in the talk of the depth: https://meta.wikimedia.org/wiki/Talk:Wikipedia_article_depth#Depth%20metrics%20makes%20little%20sense%20-%20part%202.
UPDATE:

I made this graph for my statistics where I plotted the depth* statistics against number of articles and it seems that the metrics not only doesn't experience inflation for little wikipedias, but also it has this statistics generally lower for them as we would expect from little wikipedias where is few editors to develop the wiki.
Seems there is some logarithmic dependence. Perhaps it also allows to discern wikis that manipulated their number of articles. But I'd like to make additional analysis with all wikis.








