Photo by Matan Segev from Pexels
Significantly improve the quality of data in your research by detecting incidences of random responses, satisficing, and bot participation in your online experiments and surveys.
Detection of this category of problems leads to better data quality, and better data quality means more accurate research results.
Presented in this article is a large battery of simple techniques, including open-ended questions, cognitive tasks, trick questions, traps and statistical inference. Most of the methods shown here can be readily applied to your research studies today.
Random responses are answers selected or given randomly in a survey. Stimulus-response type experiments are also vulnerable to random responding to stimuli (e.g. random keyboard key pressing rather than pressing the key the participant perceives to be correct). Online studies are particularly susceptible to random responses (Chandler & Paolacci, 2017; Clifford & Jerit, 2014) and the occurrence increases where financial rewards are offered and populations are rare. Results containing even low numbers of random responses can be wildly distorted (Credé, 2010).
Bots (aka automated form fillers) are computer programs that fill out web forms resulting in random responses. In our case, that means eligibility criteria, a demographics questionnaire, a research survey, etc. According to Dennis, Goodson, and Pearson (2018), the presence of bot-generated data in study results is becoming more prevalent as the number of online studies offering financial reward increases. Bot programs are freely available to download online, are easy to use, and plentiful (Buchanan & Scofield, 2018).
Malicious programs, generating purely invalid data in order to earn money (e.g., botnets, automated form-fillers, and survey bots) represent new and important economic and scientific threats to the completion of online questionnaires.
Satisficing is a practice where participants skim read instructions and questions, or skip reading them altogether in an attempt to reduce cognitive load. When participants satisfice, they choose answers that appear to fit the question, or in extreme cases, respond randomly (Krosnick, 1991). Participants who satisfice can be harder to detect than those who are proper outliers (e.g., random responders and bots) (Oppenheimer, Meyvis, & Davidenko, 2009).
The noise created by participants who fail to read instructions decreases the reliability of the data and increases the expense associated with running studies as the number of participants necessary to achieve a reliable result is artificially increased.
Open-ended questions (Figure 1) are great for detecting satisficing, lack of attention, and most importantly, bots. Be on the look-out for nonsensical answers or answers consisting of random characters.
Attention checks are designed to test a participant's understanding of the instructions they've been given at the beginning of a task. Attention checks can, at the very least, ensure that participants have not simply skipped reading the study instructions or otherwise have not satisficed whilst reading those instructions. Although not an absolute measure of cheating, failure to read instructions may provide reasonable suspicion and thus a closer look for anomalies in the participant's data.
For example, let us suppose that we provide some instructions before the beginning of a task and somewhere within those instructions is a condition that states:
Your task will be to answer 5 questions about fresh fruit purchases within the past year
We could then ask the following question later in the study:
If participants complete an individual question, task or study in an unreasonably quick time, then perhaps they've satisficed or provided random answers (Buchanan & Scofield, 2018; Downs, Holbrook, Sheng, & Cranor, 2010). Conversely, in a knowledge-based task, if they've taken longer than expected then they could be researching the answers rather than supplying responses from their current knowledge (Clifford & Jerrit, 2016). Thus, checking response times for outliers is a useful way to identify participants whose data should be examined in further detail before deciding on inclusion or exclusion.
Instructional manipulation checks are quite similar to attention checks. With instructional manipulation checks, we instruct the participant exactly how to answer the question rather than provide a logical answer. Oppenheimer et al. (2009) showed that amongst students tested, 7% failed a simple instructional manipulation check. A variation of their blue-dot task can be seen in Figure 3.
Oppenheimer et al. (2009) recommend that participants who initially fail an instructional manipulation check be given another chance at passing the test. These authors present evidence that this can turn a satisficing participant into a diligent participant.
Caution: Instructional manipulation checks may produce a demand characteristic effect exhibited as systematic thinking instead of natural reactions to stimuli. This effect can be produced because after an initial instructional manipulation check, some participants remain on guard with a higher likelihood of critically analyzing the task at hand. Hauser and Schwarz (2015) provided evidence for this effect using a cognitive reflection test (e.g., Frederick, 2005). The researchers showed that response times were longer and task scores higher when participants were exposed to instructional manipulation checks before a cognitive reflection test, compared to those who were exposed to instructional manipulation checks afterwards.
In sum, researchers need to consider whether reducing the incidence of satisficing is worth the increased chance of participants engaging in systematic thinking rather than natural reactions when giving their responses.
Consistency checks (Jones, House, & Gao, 2015) can be an effective method for detection of bot activity and random responses, as well as for detecting fraudulent answers to eligibility criteria. Consistency checks look for consistency between answers on two or more related questions. For example:
What is your age (in years)?
and
What is your date of birth?
Inconsistency of responses to these checks means a higher likelihood of random responding, satisficing, or bot activity.
A false question is a question where no answer is true. This method can be used to detect bots, satisficing and random responding. These questions can be built in a variety of different ways, from open-ended text questions, to number questions, to multiple choice as long as any answer given as a response is patently false.
For example, a question about knowledge of fruit could be posed:
As none of these fruit actually exist, any answer given means the participant activity can be viewed as highly suspicious and worthy of a closer look.
For detection of bot participation, the "honeypot" method can be very effective and is commonly used by blogs and forums to guard against bots posting spam messages. To take advantage of this method, you need to include a hidden form field in a survey or questionnaire. These form fields are invisible to humans but are seen by bots and therefore, if the field is not left empty, then it is likely that the participant in question is indeed a bot.
To prevent bots from taking part in your study you can include a cognitive task as a prescreen requirement which does not allow progression to the study proper unless passed (Liu & Wronski, 2018). The most commonly used method on the web for implementing these tasks is reCAPTCHA.
Author's note: reCAPTCHA is a simple cognitive challenge (courtesy of Google). A successfully completed task provides a very strong indicator that the participant is human.
Google: reCAPTCHA: Easy on Humans, Hard on Bots
Wikipedia: reCAPTCHA.
The risk of satisficing increases when conducting lengthy and/or complex tasks as participants become more likely to offload cognitive effort by satisficing. Distraction and inattention are also more likely to occur when performing these tasks online rather than in-lab (Finley & Penningroth, 2015). Providing the participant with feedback on their performance after each trial, or block of trials, or after a section of survey, etc., can increase engagement, thus reducing the incidence of satisficing.
Statistical induction can be used to detect the presence of bot generated responses in large datasets. Dupuis, Meier, and Cuneo (2018) compared seven indices and ranked them based on validity (i.e., the extent to which simulated data were identified correctly), feasibility (i.e., how easily indices can be calculated) and potential specificities.
Based on effectiveness and ease of use, the researchers recommend the following three indices:
Dupuis et al. (2018) recommend researchers use both the Mahalanobis distance and person–total correlation indices for detection of bot activity in their online questionnaires because these indices are so easy to calculate. However, Downs et al. (2010) warn that when there are large numbers of random responses in the complete dataset, these methods can be non-optimal.
The propensity to satisfice is correlated with certain personality constructs, and varies between individuals and cognitive loads experienced (Krosnick, Narayan, & Smith, 1996). Participants who satisfice can arguably be considered as legitimate samples and removing their data also removes a legitimate source of variability, and therefore produces problems with generalizability (Oppenheimer et al., 2009).
Combining a number of different strategies and techniques explained in this article can help reduce the occurrence of satisficing, and the incidence of random responses from both humans and bots. Typically most detection and prevention attempts by experiment and survey creators are made at the beginning of a study (i.e. prescreen checks). However, professional survey takers and experienced random response participants may well be nonnaive to various strategies explained in this article and also well aware that most often these checks are part of a prescreen check. It is common for experienced random responders to simply focus on the prescreening checks before providing random responses for the rest of the study. Downs et al. (2010) suggest that spreading these checks throughout a study can prevent such behavior.