Village DB: About

Welcome to the Roots Village Database, a digitization of the information from the Index of Clan Names By Villages published by the American Consulate General in Hong Kong in the 1970s. Originally used to investigate immigration fraud, this data is now valuable for genealogy research.

The data here comes from the Index of Clan Names By Villages. There are four books, one each for Toishan, Sunwui, Hoiping, and Chungshan. Posted here is the note from the reprint edition, along with the introductions from the original four volumes:

Data entry from all four volumes is complete. Please let us know if you find any errors.

Newly added is village data from Yanping/Enping 恩平, from the 恩平县志 (恩平县地方志编纂委员会编 2004. 北京市: 方志出版社). Thanks to Patrick Chew for digitizing this data.

Thanks to Him Mark Lai for wanting this to happen in the first place; to Beatrice Yu and Tony Tong for the early work starting in 2001; and to Andy Fong et al. for hosting the database at its current site.

Frequently Asked Questions

Q: What does "Map Location" in the Heungs mean?

A: According to the introduction of the Index, the map locations are keyed to the grid coordinates of the U.S. Army Map Service Series covering Kwangtung Province. The grid system is MGRS (Military Grid Reference System), but using older grid labels where the second letter is off by ten letters (not including I and O). Thus, "FQ8262" would be "FE8262" in today's MGRS grid (the full coordinates would be 49QFE8262). As a further complication, the map in the original Hoiping book erroneously uses the polyconic grid (with 10,000 yard grid marks), and swaps the horizontal and vertical components. Thus, for Hoiping, "FQ0589" should have been "FQ7473", which translates to 49QFE7473 in today's MGRS grid system.

For your convenience we have converted the MGRS location to an approximate area viewable on Google Maps. (Please keep in mind that the precision of the original map locations is ±1000m, and street map data for China as shown on Google is GPS-offset by another margin of error.)

Technical Notes

This site requires CSS and cookies enabled, and prefers JavaScript to be on.

For those of you who are curious, the database is running on MySQL as a backend, and perl cgi scripts for the interface. Input of the data in Chinese was done by volunteers using STC, or Standard Telegraph Code, which maps a 4-digit code to a character. Apparently, there are two different telegraph encodings, one for Taiwan and one for mainland China. The version used in our data is of the mainland variety, and apparently can be found in a book entitled 《電報明碼》. Naturally, this book is nowhere to be found (I haven't had the chance to beam over to Hong Kong and search the large bookstores there [update: I have, and it's still nowhere to be found, though I didn't have time to search the big libraries there]), and the various tables out on the internet are rife with mistakes. The telegraph data that this database uses is culled mainly from information put together by the Unicode people. This data, combined with a couple of other sources, gives us a telegraph code table of 7977 characters, which still appears to be missing a few. If anyone knows where I might find a more complete table, or the book, please let me know.

Romanizations for the characters are provided in jyutping (Cantonese) and pinyin (Mandarin). Great effort has been placed into making sure that (most of) these are correct. Let us know if you run into problems.


The Him Mark Lai Digital Archive
Siyi Genealogy
A Toisanese/Szeyap Bibliography

Version History

1.32 - 2020.07.27 1.31 - 2020.07.23 1.3 - 2020.07.22 1.2 - 2020.07.18 1.1 - 2020.07.09 1.0 - 2020.07.04 0.96 - 2017.10.18 0.95 - 2005.01.29 0.94 - 2005.01.10 0.93 - 2004.11.22 0.92 - 2004.06.26 0.91 - 2004.02.01 0.9 - 2004.01.27 0.83 - 2002.03.01 0.81 - 2002.01.21 0.8 - 2001.12.08
last modified 2020 July 28 by Dominic Yu