且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 Mediawiki API 获取一个类别中不属于另一个类别的所有图像?

更新时间:2022-10-17 18:55:43

MediaWiki has - by default - no built-in support for category building and querying intersections. To accomplish this task, extensions or external tools or multiple API queries and result processing is required.

CirrusSearch API

On Wikimedia Commons, like on the whole Wikimedia Wiki farm, CirrusSearch powers filtered search, including search for category intersections and is also available through API (action=query&list=search&srsearch=incategory:A+-incategory:B, this is Category:A minus Category:B).

FastCCI

One of the tools I can recommend (because it's a dedicated high-performance solution and actually running) is fastcci, developed by Daniel Schwen; specifically for Wikimedia Commons, there is already a database maintained and a webservice running but it's possible to set it up for any wiki, provided the tool set has a host to run on and has database access.

Query

Consider the following query URL:

https://fastcci.wmflabs.org/?c1=3302993&c2=15516712&d1=0&d2=0&s=200&a=not&t=js

  • https://fastcci.wmflabs.org/ - Host Wikimedia Commons fastcci runs on
  • c1 - ID of category 1
  • c2 - ID of category 2
  • d1 - depth of category 1 to search in (fastcci by default considers sub-categories)
  • d2 - depth of category 2 to search in (fastcci by default considers sub-categories)
  • s - Number or results to return
  • o - Offset
  • a - conjunction
  • t - connection type (t=js for a JSONP response; otherwise assumes being used as websocket)

Response

fastcciCallback( [ 'RESULT 27572680,0,0|1675043,0,0|27577015,0,0|27577043,0,0|27577106,0,0|27576896,0,0|27576790,0,0|23481936,0,0|17560964,0,0|11009066,0,0', 'OUTOF 10', 'DBAGE 378310', 'DONE'] );

RESULT followed by a | separated list of up to 50 integer triplets of the form pageId,depth,tag. Each triplet stands for one image or category

Resources

A note on pageIDs