Skip to Main Content

Research Data Management: Data Collection/Generation

Data collection

Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities, business, etc. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same.

Northern Illinois University Faculty Development and Instructional Design Center. N.D. Responsible conduct of Research [Electronic]. Available at: Accessed: 27 July 2013

Primary data are generated and compiled by administering an original study, such as interviews, surveys, or focus groups. These types of data are designed to address a specific issue or information need that is not found in existing sources.

Secondary data come from information sources that already exist, such as statistical abstracts, state reports, historical studies, and other published literature.  These sources should be evaluated just as primary data are examined, and the information should be corroborated by using as many sources as feasible, given time and resources.

University of Illinois. A step-by-step guide to  conduction a social profile for watershed planning [Electronic]. Available at: Accessed 27 July 2013


There are a number of methods available to researchers for collecting data. The commonly used data collection methods can be divided into the following types:

Mouton, J. 2001. How to succeed in your master's & doctoral studies: a South African guide and resource book. Pretoria: Van Schaik.

It is imperative that you document your data collection process as accurately and in as much detail as possible as a historical record for yourself and other possible researchers.

General guidelines for aspects of your project and data that you should document, regardless of your discipline, can be found on the library website under Research Data Management.  Useful information about basic, practical strategies for data management is available here.


Mouton, J. 2001. How to succeed in your master's & doctoral studies: a South African guide and resource book. Pretoria: Van Schaik

The natural sciences (and increasingly in other fields) often require sophisticated instrumentation, recording devices and scientific equipment for data collection.
Using previously validated collection instruments can save time and increase the study's credibility. However, remember that all data collection instrumentation, such as surveys, physiologic measures (blood pressure or temperature), or interview guides, must be identified and described.

Mouton, J. 2001. How to succeed in your master's & doctoral studies: a South African guide and resource book. Pretoria: Van Schaik.

Indiana State University Cunningham Memorial Library. Finding Research Instruments, Surveys, and Tests Libguide [Electronic]. Available at: Accessed 27 July 2013

The short video below by Ian Bailey-Mortimer explains the significance of populations and samples, census vs survey, open and closed questions and bias in questionnaires, sampling and interpretation.

Primary and Secondary Data

Publicly Accessible Secondary Data Sources

Google Dataset Search

Dataset Search is a search engine for datasets. Using a simple keyword search, users can discover datasets hosted in thousands of repositories across the web.

Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions.

International Data Sources

Alpha Vantage Stock API for Global Markets [Open web access]
This is a web-based resource covering 100,000+ international financial and economic datasets including stocks, ETF's, company financials, foreign exchange rates, and over 50 quantitative market signals. Multiple data querying methods are supported, including HTTP/ web access spreadsheets such as excel and Google Sheets. Stellenbosch University affiliates can reach out to for technical assistance

Demographic & Health Surveys:
The DHS Program assists developing countries worldwide in the collection and use of data to monitor and evaluate population, health, and nutrition programs

EU KLEMS is an industry level, growth and productivity research project. EU KLEMS EU level analysis of capital (K), labour (L), energy (E), materials (M) and service (S) inputs.

EU Open Data Portal:
The European Union Open Data Portal (EU ODP) gives you access to open data published by EU institutions and bodies. All the data you can find via this catalogue are free to use and reuse for commercial or non-commercial purposes.

Food and Agricultural Organization of the United Nations provides free access to food and agriculture data for over 245 countries and territories and covers all FAO regional groupings.

The FPMA Tool provides an easy way to access the large amounts of data present in the FAO database. It allows users to quickly browse single price series, create comparisons among countries/markets/commodities, download of charts, data and basic statistics such as averages, standard deviations and percentage changes.

Global Findex Database:
The Global Findex database is the world’s most comprehensive data set on how adults save, borrow, make payments, and manage risk.

International Monetary Fund Data:
The IMF publishes a range of time series data on IMF lending, exchange rates and other economic and financial indicators. Manuals, guides, and other material on statistical practices at the IMF, in member countries, and of the statistical community at large are also available.

The Global Economy: serves researchers, academics, investors, and business people who need reliable economic data on foreign countries. They provide up-to-date numbers for GDP, inflation, credit, interest rates, employment, and many other indicators. The data series are updated continuously based on the release dates of individual countries. 

The World Bank Data Catalog:
DataBank is an analysis and visualisation tool that contains collections of time series data on a variety of topics. You can create your own queries; generate tables, charts, and maps; and easily save, embed, and share them.

Trade Map:
Trade statistics for international business development. Monthly, quarterly and yearly trade data. Import & export values, volumes, growth rates, market shares, etc.

World Development Indicators (World Bank):
The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates

World Income Inequality Database (WIID):
The UNU-WIDER World Income Inequality Database ― widely known by its acronym WIID ― collects and stores information on income inequality for developed, developing, and transition countries.


African Data Sources

African Development Indicators (World Bank)
Africa Development Indicators was a primary World Bank collection of development indicators on Africa, compiled from officially-recognized international sources. No further updates of this database are currently planned. See World Development Indicators for more recent data on Africa

Africa Information Highway
The Africa Information Highway (AIH) was developed by the Statistics Department of AfDB as part of the Bank’s statistical capacity building program (SCB) in Africa. AIH is a mega network of live open data platforms (ODPs) electronically linking all African countries and 16 regional organizations.  The overall objective is to significantly increase public access to official and other statistics across Africa, while at the same time supporting African countries to improve data quality, management, and dissemination.

Afrobarometer is a pan-African, non-partisan research network that conducts public attitude surveys on democracy, governance, economic conditions, and related issues in more than 35 countries in Africa

Bureau of Economic Research (BER):
The database came into existence through the business survey for the BER in the retail, manufacturing and building-and-construction sectors. The data is used mainly to indicate confidence levels among business owners in the country, as well as other levels of business related issues (e.g. percentage shortage of skilled labour a constraint). In addition the BER has a library of research notes, comments and papers of economic relevance spanning the history of the South African economy

Bureau of Marker Research (BMR):
The BMR maintains and has access to a substantial range of secondary data from various sources. Data searches can be undertaken in almost any marketing, economic or social field

Centre for Risk Analysis (Previously SAIRR):
The CRA provides its users with a complete spectrum of strategic intelligence reports, briefings, polls, scenarios, and bespoke advisory services on South Africa's rapidly evolving economic, social, policy and political climate. Have a look at their monthly Fast Facts as well as the South African Survey.

Data First:
DataFirst is a Research Unit and Data Service based at the University of Cape Town, South Africa. We give researchers online access to survey and administrative microdata (data at unit record level) from South Africa and other African countries. We assist researchers to use the data via our online helpdesk and offer formal training courses in microdata analysis.

Statistics South Africa:
South African Census data and reports. 

South African Reserve Bank:
Statistics from the South Africa Reserve Bank, includes baking sector information, composite business cycle indicators, economic and financial data for South Africa, external debt etc. 

Wazi Maps:
Wazimap provides useful facts about places in South Africa. Compare places using tables and maps, download data, and embed charts on your site, making census data easy to use.

The list below is based on a collection maintained by Sarah Callaghan from The British Atmospheric Data Centre, but has been extended with what we found through internet searches (last updated 16/05/2018). The list is probably not complete. If you are aware of any other data journal, please let us know.



F1000Research (F1000 Research Ltd., Science Navigation Group)

Open access journal for life scientists, immediate publication, transparent peer review (post-publication) and full data deposition and sharing All kinds of scientific work-documentation are accepted, Data articles are citable

Dataset Papers in Science (Hindawi Publishing Corporation)

Peer reviewed, open access journal that publishes dataset papers in a wide range of subjects in science and medicine

Data Science Journal (Committee on Data for Science and Technology (CODATA) of the International Council for Science (ICSU))

Peer-reviewed, open access publishing papers on the management of data and databases in Science and Technology. descriptions of data systems, their publication on the internet, applications and legal issues. publishes data or data compilations, online simulations, databases, and other experiments

Scientific Data (Nature Publishing)

Open-access, online-only publication for descriptions of scientifically valuable datasets, initially focusing on the life, biomedical and environmental science communities

Open research software

Peer-reviewed meta-journal describing research software, covers different aspects of creating, maintaining and evaluating open source research software

Data in Brief

Data in Brief is a open-access and peer-reviewed journal who welcomes submissions that describe data from all research areas.

Data (Open Access Journal of Data in Science)

Data is an open access journal on data in science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published quarterly online by MDPI. Open Access and peer-reviewed.


Patterns is the home for data scientists and researchers in data-intensive sciences in both academia and industry. The journal is domain agnostic and offers breadth and depth across the spectrum of disciplines, including computational, physical, life, and social sciences, and the humanities.


Biology, Medicine, Life Sciences

Biodiversity Data Journal (Pensoft)

Community peer-reviewed, open-access, comprehensive online platform to accelerate publishing, dissemination and sharing of biodiversity-related data of any kind

Open Data Journal for Agricultural Research

The Open Data Journal for Agriculture Research (ODjAR) acts as a central hub for storing, curating and publishing the data sets as a resource for the future where publications and their authors get appropriate credit through citations and digital object identifiers for future reference. Many different data sets exist, that are of value and deserve accreditation: experimental data, surveys, model inputs, model outputs, derived indicators and statistics, data assimilation and mark-ups, maps, measured data points.

Genomics Data

To offer improved discoverability and accessibility to our authors' research, Elsevier is proud to announce the absorption of Genomics Data into Data in Brief, our multidisciplinary peer-reviewed data journal. Together, the journals will be able to better serve the genomics community as a unified outlet for your data.

BMC Research Notes (BioMed Central, Springer)

Open access journal publishing all fields of biology and medicine short publications, case series, incremental updates to previous work also encourages the publication of software tools, databases, data sets

Ecological Archives (Ecologigal Society of America (ESA))

Publishes materials that are supplemental to articles that appear in the ESA journals publishes peer-reviewed data papers with abstracts digital, Internet-accessible form kinds of publications: appendices, supplements, and data papers

Biomedical Data Journal (BMDJ)

Biomedical Data Journal (BMDJ) is an open access journal aiming to facilitate the presentation, validation, use, and re-use of datasets, with focus on publishing biomedical datasets that can serve as a source for simulation and computational modelling of diseases and biological processes.


Ecology publishes articles that report on the basic elements of ecological research. Emphasis is placed on concise, clear articles documenting important ecological phenomena. The journal publishes a broad array of research that includes a rapidly expanding envelope of subject matter, techniques, approaches, and concepts: paleoecology through present-day phenomena; evolutionary, population, physiological, community, and ecosystem ecology, as well as biogeochemistry; inclusive of descriptive, comparative, experimental, mathematical, statistical, and interdisciplinary approaches.

GigaScience (BioMed Central)

Online open access open-data for life and biomedical sciences novel publication format: links standard manuscript publication with an extensive database hosting all associated data provides data analysis tools and cloud-computing resources

Journal of Open Public Health Data (Ubiquity Press Open Access)

Open Health Data features peer-reviewed data papers describing health datasets with high reuse potential. We are working with a number of specialist and institutional data repositories to ensure that the associated data are professionally archived, preserved, and openly available. Equally importantly, the data and the papers are citable, and reuse will be tracked.


Earth Sciences, Geography

Geoscience Data Journal (Wiley)

Open access platform, scientific peer-review, online-only journal publishes short data papers cross-linked to, and citing, datasets that have been deposited in approved data centers

Earth System Science Data (Copernicus Publications)

Online peer-reviewed, open-access data publishing journal Dataset are annotated with standard metadata and made available through certified data center/repository

IUCrData (International Union of Crystallography)

IUCrData is a peer-reviewed open-access data publication from the International Union of Crystallography (IUCr). This innovative publication aims to provide short descriptions of crystallographic datasets and datasets from related scientific disciplines, as well as facilitating access to the data. The primary article category is Data Reports; these describe crystal structures of inorganic, metal-organic or organic compounds. Information on each crystal structure includes the crystallographic data (CIF and structure factors), a data validation report, figures and a text representation of the data.

International Journal of Spatial Data Infrastructures Research

IJSDIR is a peer-reviewed journal that is operated by Joint Research Centre of the European Commission. The aim of the Journal is to further the scientific endeavour underpinning the development, implementation and use of Spatial Data Infrastructures (SDIs). We welcome a range of submission types, including full-scientific articles, notes from the field and geospatial data set descriptions.
The Journal is published openly and free of charge. It adheres to the Open Archives Initiative, which aims to facilitate the dissemination of electronic content.


Physics, Chemistry

Journal of Physical and Chemical Research Data (AIP Publishing LLC)

Published online daily to provide critically evaluated physical and chemical property data, fully documented as to the original sources and the criteria used for evaluation, preferably with uncertainty analysis. The journal is not intended as a publication outlet for original experimental measurements such as those normally reported in the primary research literature, nor for review articles of a descriptive or primarily theoretical nature

Journal of Chemical and Engineering Data

The Journal of Chemical & Engineering Data is a monthly and peer-reviewed journal devoted to the publication of data obtained from both experiment and computation, which are viewed as complementary. The scope of this Journal includes thermophysical properties obtained from quantum chemistry, molecular simulation, and molecular mechanics calculations as well as reviews of experimental techniques. The Journal publishes Articles and Reviews. In addition, Comments, Book Reviews, and Additions and Corrections are published. 

Chemical Data Collections

The research data will be published as 'data articles' that support fast and easy submission and quick peer-review processes. Data articles introduced by CDC are short self-contained publications about research materials and data. The journal welcomes submissions focusing on (but not limited to) the following categories of research output: spectral data, syntheses, crystallographic data, computational simulations, molecular dynamics and models, physicochemical data, etc.

Atomic Data and Nuclear Data Tables

Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive and comprehensive compilations of experimental and theoretical results are featured. Supports Open Access.

Nuclear Data Sheets

The Nuclear Data Sheets are current and are published monthly. They are devoted to compilation and evaluations of experimental and theoretical results in Nuclear Physics. Supports Open Access.



The International Journal of Robotics Research - Data Papers (Sage Publications)

Publishes peer reviewed data papers and multimedia extensions alongside articles


Humanities, Archeology

Journal of Open Archaeology Data (Ubiquity Press Open Access)

The Journal of Open Humanities Data (JOHD) features peer reviewed publications describing humanities data or techniques with high potential for reuse. Humanities subjects of interest to JOHD include, but are not limited to Art History, History, Linguistics, Literature, Music, Philosophy, Religious Studies, etc. Data that crosses one or more of these traditional disciplines are highly encouraged.

Journal of Open Humanities Data (Ubiquity Press Open Access)

The Journal of Open Humanities Data (JOHD) features peer reviewed publications describing humanities data or techniques with high potential for reuse. Humanities subjects of interest to JOHD include, but are not limited to Art History, History, Linguistics, Literature, Music, Philosophy, Religious Studies, etc. Data that crosses one or more of these traditional disciplines are highly encouraged.


Social Sciences, Economics

Research Data Journal for the Humanities and Social Sciences

The Research Data Journal is a digital-only open access journal, which documents deposited data sets through the publication of data papers. Data papers are scholarly publications of medium length containing a non-technical description of a data set and putting the data in a research context. The journal concentrates on the social sciences and the humanities, covering history, archaeology, language and literature in particular. The publication languages are English and Dutch.

Journal of Statistical Software

Established in 1996, the Journal of Statistical Software publishes articles, book reviews, code snippets, and software reviews on the subject of statistical software and algorithms. The contents are freely available on-line. Statistical software is the key link between statistical methods and their application in practice. Software that makes this link is the province of the journal, and may be realized as, for instance, tools for large scale computing, database technology, desktop computing, distributed systems, the World Wide Web, reproducible research, archiving and documentation, and embedded systems. We attempt to present research that demonstrates the joint evolution of computational and statistical methods and techniques.

Journal of Economics and Statistics

The Journal of Economics and Statistics publishes papers in all fields of economics and applied statistics. A specific focus is on papers combining theory with empirical analyses. Papers providing conclusions for economic policy in Europe are particularly welcome. Nevertheless, distinguished papers dealing exlusively with economic theory, empirical models, or economic history, will not be excluded from consideration. The section Data Observer presents articles on data sets available for empirical research and institutions providing research data. The journal also publishes special issues, short comments, and book reviews.



Journal of Open Psychology Data (Ubiquity Press Open Access)

The Journal of Open Psychology Data (JOPD) features peer reviewed data papers describing psychology datasets with high reuse potential. Data papers may describe data from unpublished work, including replication research, or from papers published previously in a traditional journal. We are working with a number of specialist and institutional data repositories to ensure that the associated data are professionally archived, preserved, and openly available.


Stellenbosch University's Institutional Research Data Repository.