Wiki Word Statistics

A search for each of these words against the search database available on 7 March 2000 gave these results

The database contained 7546 pages.

It's interesting to compare those last two separately, rather than combine them.

Please append to, rather than modifying these figures, so that we can compare against them at some later date. My guess would be that in, say, a years time, the XP pages will be a lower proportion of the total, since the WikiMind will have drifted elsewhere. --KeithBraithwaite

May 12th, 2001

The database contained 15,289 pages. Searching for word hits didn't work too well, as the size of the resulting pages caused network dropouts.

The database grew by 102%. ExtremeProgramming grew by 75%. XP grew by 104%, Wiki grew by 161%, and patterns by 27%.


It is a part of usual LanguageOrientedProgramming practice to look at the words that are used in a system. I had postponed this for a while (being new to the Wiki), but now I did.

The list at the end of this page is the first part of the output of processing wikiList.

If you do this on a software API you usually find something interesting. Special words, redundant words, wrong words ... but at first sight, I didn't find anything of significance.

Of course, if you look at the first few lines (strip some simple words) you find what this Wiki is about: Wiki Programming Patterns Extreme.

But when I read on, I felt like a shaman priest having thrown a bag of bones to read the present and the future:

to name just a few. Just try it! Perhaps some expert can read and interpret this. I'm unable to. -- HelmutLeitner

If you play a little loose, the very first few words sum up Wiki pretty well:


See also WikiMines.


On a similar note, I am trying to divise a way to determine the "centres" of a wiki (or things similar to wikis). My best attempt so far has been http://usemod.com/cgi-bin/mb.pl?ShortestPathPages. -- SunirShah


As one might expect, the number of occurrences of a given WikiWord per WikiPage obeys a PowerLaw. This hypothesis was tested in March 2003 with 724 pages containing the WikiWord "UnitTest". A LogLog plot of the count of the pages with a given number of occurrences of "UnitTest" was created. The values are linear for the first two orders of magnitude, though they diverge from the ideal value as the number of occurrences of "UnitTest" per page increases:

Linear regression yields r-squared = 0.936.

A second test with "ExtremeProgramming" and 1,189 BackLinked pages gave a similar result, with r-squared = 0.950:

Binning the data increases r-squared to 0.99+.


The original version of this list counted each entry twice. This has been corrected.

Files: 1 Found: 80843

Count Statistic:

  1. The
  2. Wiki
  3. Of
  4. And
  5. Programming
  6. Patterns
  7. To
  8. Extreme
  9. Is
  10. Xp
  11. In
  12. Pattern
  13. Software
  14. For
  15. Java
  16. Language
  17. Design
  18. Object
  19. Test
  20. Code
  21. Page
  22. On
    1. Web
    2. What
    3. As
    4. Category
    5. With
    6. John
    7. Are
    8. Not
    9. Smalltalk
    10. Unit
    11. It
    12. Discussion
    13. You
    14. Two
    15. Refactoring
    16. One
    17. Use
    18. Mc
    19. Do
    20. Group
    21. Project
    22. David
    23. How
    24. New
    25. From
    26. By
    27. About
    28. This
    29. Component
    30. Topic
    31. Work
    32. At
    33. Testing
    34. Vs
    35. Be
    36. Ejb
    37. Objects
    38. System
    39. Systems
    40. Development
    41. Dont
    42. Meeting
    43. User
    44. Visual
    45. Name
    46. Why
    47. Your
    48. Model
    49. Tcpg
    50. Management
    51. Michael
    52. Time
    53. First
    54. Good
    55. Process
    56. Class
    57. People
    58. Method
    59. Refactor
    60. All
    61. Com
    62. Data
    63. Free
    64. Just
    65. More
    66. Net
    67. Server
    68. That
    69. Three
    70. Architecture
    71. Link
    72. Mark
    73. Problem
    74. Value
    75. Big
    76. Book
    77. Interface
    78. Changes
    79. No
    80. Peter
    81. An
    82. Bill
    83. Cpp
    84. Jim
    85. Meta
    86. Source
    87. Challenge
    88. Programmer
    89. Books
    90. Case
    91. Dot
    92. Exceptions
    93. List
    94. Open
    95. Pages
    96. Change
    97. Engineering
    98. Mike
    99. My
    100. Robert
    101. Computer
    102. Dave
    103. Plus
    104. Principle
    105. Game
    106. Links
    107. Microsoft
    108. Oriented
    109. Pair
    110. Tom
    111. De
    112. Eric
    113. Go
    114. Methodology
    115. Story
    116. James
    117. Knowledge
    118. Mode
    119. Richard
    120. Steve
    121. Thing
    122. Way
    123. Bob
    124. Me
    125. Mind
    126. Space
    127. Up
    128. World
    129. Art
    130. Business
    131. Chris
    132. Example
    133. Form
    134. Function
    135. Law
    136. Real
    137. Stories
    138. Technology
    139. Vb
    140. Ytwok
    141. Information
    142. Martin
    143. Nine
    144. Or
    145. Paul
    146. Python
    147. Things
    148. Tim
    149. Too
    150. Alan
    151. Anti
    152. Ats
    153. Community
    154. Framework
    155. History
    156. Recent
    157. State
    158. Team
    159. Tests
    160. Text
    161. Thomas
    162. When
    163. Write
    164. Analysis
    165. Bad
    166. Delete
    167. Great
    168. Isa
    169. Life
    170. Make
    171. Metaphor
    172. Perl
    173. Thread
    174. Twenty
    175. Words
    176. Works
    177. Basic
    178. Beans
    179. Box
    180. Can
    181. Music
    182. Need
    183. Public
    184. Thousand
    185. Uml
    186. Based
    187. Exception
    188. Home
    189. Idea
    190. Languages
    191. Quality
    192. Science
    193. Talk
    194. Who
    195. Word
    196. Brian
    197. Coding
    198. Does
    199. Four
    200. Functional
    201. Green
    202. Jeff
    203. Once
    204. Review
    205. Rule
    206. Rules
    207. Self
    208. Should
    209. Smith
    210. Stone
    211. Users
    212. Abstract
    213. Before
    214. Common
    215. Interfaces
    216. Like
    217. Non
    218. Oo
    219. Out
    220. Scott
    221. Script
    222. Seven
    223. Together
    224. Tool
    225. We
    226. Writing
    227. Browser
    228. Classes
    229. Document
    230. Factory
    231. Implementation
    232. Little
    233. Ninety
    234. Plan
    235. Programmers
    236. Reuse
    237. Right
    238. Solution
    239. View
    240. Anonymous
    241. Bug
    242. Comments
    243. Components
    244. Considered
    245. Dead
    246. Distributed
    247. Hard
    248. Its
    249. Joe
    250. Know
    251. Leadership
    252. Mac
    253. Machine
    254. Multi
    255. Order
    256. Other
    257. Post
    258. Problems
    259. Program
    260. Question
    261. Questions
    262. Style
    263. Types
    264. Visitors
    265. Andrew
    266. Bean
    267. Card
    268. Content
    269. Could
    270. Dan
    271. Database
    272. Documentation
    273. Edit
    274. Enterprise
    275. Faq
    276. Fic
    277. Frank
    278. Games
    279. Gof
    280. Greg
    281. Grok
    282. Int
    283. Love
    284. Man
    285. Message
    286. Only
    287. Over
    288. Paper
    289. Please
    290. Point
    291. Power
    292. Reviews
    293. Side
    294. Simple
    295. Six
    296. Soft
    297. Solutions
    298. Stephen
    299. Success
    300. Think
    301. Tools
    302. Unix
    303. Using
    304. Ward
    305. Will
    306. Agent
    307. Application
    308. Bruce
    309. Computing
    310. Daniel
    311. Definition
    312. Effect
    313. Entity
    314. Flow
    315. Immersion
    316. Kent
    317. Kevin
    318. Line
    319. Methods
    320. Null
    321. Person
    322. Reading
    323. Requirements
    324. Roger
    325. Ron
    326. Search
    327. Star
    328. Thinking
    329. Tips
    330. Well
    331. Workshop
    332. Another
    333. Cant
    334. Cards
    335. Cee
    336. Clear
    337. Culture
    338. Developer
    339. Domain
    340. Don
    341. Doug
    342. Editing
    343. End
    344. Evil
    345. Examples
    346. Full
    347. Future
    348. Get
    349. Harmful
    350. Has
    351. Have
    352. Here
    353. Junit
    354. Lazy
    355. Learning
    356. Library
    357. Map
    358. Modeling
    359. Old
    360. Oopsla
    361. Planning
    362. Plop
    363. Principles
    364. Pro
    365. Resource
    366. Second
    367. Simplest
    368. So
    369. Task
    370. Type
    371. Van
    372. Wall
    373. Win
    374. Active
    375. Analogy
    376. Back
    377. Bell
    378. Best
    379. Binary
    380. But
    381. Client
    382. Control
    383. Corporation
    384. Cplus
    385. Editor
    386. Emacs
    387. Five
    388. Fix
    389. George
    390. God
    391. Human
    392. Ideal
    393. Inheritance
    394. Long
    395. Most
    396. News
    397. Quote
    398. Reference
    399. Research
    400. Sand
    401. Session
    402. Single
    403. Society
    404. Stuff
    405. Theory
    406. Tri
    407. Variables
    408. William
    409. Writers
    410. Age
    411. Better
    412. Between
    413. Blue
    414. Bugs
    415. Builder
    416. Charles
    417. Command
    418. Complex
    419. Context
    420. Continuous
    421. Cool
    422. Copy
    423. Cost
    424. Death
    425. Driven
    426. Ed
    427. Edward
    428. Factor
    429. File
    430. Frameworks
    431. Guide
    432. He
    433. Hot
    434. Hyper
    435. Integration
    436. Keith
    437. Ken
    438. Keyboard
    439. Lisp
    440. Memory
    441. Multiple
    442. Names
    443. Nature
    444. Org
    445. Play
    446. Plug
    447. Processing
    448. Small
    449. Spaces
    450. Standard
    451. Structure
    452. There
    453. Trial
    454. University
    455. Values
    456. Ware
    457. Where
    458. Zen
    459. Applications
    460. Architect
    461. Architectural
    462. Around
    463. Author
    464. Black
    465. Blocks
    466. Build
    467. Call
    468. Composite
    469. Crc
    470. Cultural
    471. Douglas
    472. Down
    473. Environment
    474. Evolutionary
    475. Evolving
    476. Experiment
    477. External
    478. Failure
    479. Fast
    480. Forth
    481. Groups
    482. Ian
    483. Institute
    484. Inter
    485. Issues
    486. Jean
    487. Larry
    488. Linux
    489. Load
    490. Never
    491. Nick
    492. Os
    493. Own
    494. Possibly
    495. Practice
    496. Product
    497. Projects
    498. Proof
    499. Quotes
    500. Ralph
    501. Read
    502. Really
    503. Replace
    504. Risk
    505. Rob
    506. Role
    507. Room
    508. Sam
    509. Servlet
    510. Short
    511. Silicon
    512. Study
    513. Thirty
    514. Threads
    515. Tree
    516. Very
    517. Visitor
    518. Without
    519. Xml

See also HowWeTalk, WikiStatistics


CategoryWikiStructure CategoryStatistics