In marketing research, as well as in studies of other categories, the technique of web scraping as a method of collecting information. Basically, it is the use of software to extract batches of data from the web, under certain parameters.
In accordance with the current times, researchers from the Erasmus University of Rotterdam, the University of Tilburg, INSEAD and the University of Oxford published a methodological framework proposal, focused on improving the validity of web data, adjusting to the legal challenges in around this task.
New methodology to validate data extracted through web scraping
Although in most cases it may be permissible to collect information from publicly available sites, researchers still need to be careful about how they design their mining software.
Collecting information from publicly available user profiles can raise privacy concerns in some jurisdictions, which is why researchers are encouraged to anonymize your data during collection.
Considering that the web is an important source of information for market research today, researchers need to make sure about the validity of the extracted data sets. This research team developed a novel methodological framework that highlights how addressing validity issues requires joint consideration of technical and legal/ethical issues specific to each territory.
The authors, in conversation with the American Marketing Association, noted that their methodological framework “It covers the broad spectrum of validity issues that arise throughout the three stages of automated web data collection for academic use: data source selection, data collection design, and data extraction. In discussing the methodological framework, we offer a stylized marketing example for illustration. We also provide recommendations to address the challenges investigators face during web data collection through web scraping and APIs».
Understanding the richness and versatility of web data is extremely valuable knowledge for academics who are curious about integrating it into their research programs. The Article documenting this study also provides a systematic review of over 300 articles using web data published in the top five marketing journals. Based on this review, the researchers demonstrated how web data has taken over the design of marketing strategies.
The researchers also noted that they use their methodological framework and typology “to discover new and underexploited ‘goldfields’ associated with web data. We seek to demystify the use of web scraping and APIs, and thereby facilitate the broader adoption of web data across the marketing discipline. Our future research section highlights novel and creative ways to use web data including exploring underused sources, building rich data sets from multiple sources, and fully exploiting the potential of APIs beyond data mining.”.
In the website of this project, it is possible to find enough material of interest for people dedicated to investigative work. Alongside the database developed for this study, you can also access additional resources and tutorials for data collection through the use of APIs and web scraping.