Dealing with bots, randoms and satisficing in online research

Article by Ben Howell

 Photo by Matan Segev from Pexels

Photo by Matan Segev from Pexels

Significantly improve the quality of data in your research by detecting incidences of random responses, satisficing, and bot participation in your online experiments and surveys.

Detection of this category of problems leads to better data quality, and better data quality means more accurate research results.

Presented in this article is a large battery of simple techniques, including open-ended questions, cognitive tasks, trick questions, traps and statistical inference. Most of the methods shown here can be readily applied to your research studies today.


Contents


Definitions

What are random responses?

Random responses are answers selected or given randomly in a survey. Stimulus-response type experiments are also vulnerable to random responding to stimuli (e.g. random keyboard key pressing rather than pressing the key the participant perceives to be correct). Online studies are particularly susceptible to random responses (Chandler & Paolacci, 2017; Clifford & Jerit, 2014) and the occurrence increases where financial rewards are offered and populations are rare. Results containing even low numbers of random responses can be wildly distorted (Credé, 2010).

What are bots?

Bots (aka automated form fillers) are computer programs that fill out web forms resulting in random responses. In our case, that means eligibility criteria, a demographics questionnaire, a research survey, etc. According to Dennis, Goodson, and Pearson (2018), the presence of bot-generated data in study results is becoming more prevalent as the number of online studies offering financial reward increases. Bot programs are freely available to download online, are easy to use, and plentiful (Buchanan & Scofield, 2018).

Malicious programs, generating purely invalid data in order to earn money (e.g., botnets, automated form-fillers, and survey bots) represent new and important economic and scientific threats to the completion of online questionnaires.

Dupuis, Meier, & Cuneo, 2018.

What is satisficing?

Satisficing is a practice where participants skim read instructions and questions, or skip reading them altogether in an attempt to reduce cognitive load. When participants satisfice, they choose answers that appear to fit the question, or in extreme cases, respond randomly (Krosnick, 1991). Participants who satisfice can be harder to detect than those who are proper outliers (e.g., random responders and bots) (Oppenheimer, Meyvis, & Davidenko, 2009).

The noise created by participants who fail to read instructions decreases the reliability of the data and increases the expense associated with running studies as the number of participants necessary to achieve a reliable result is artificially increased.

Oppenheimer, Meyvis, & Davidenko, 2009.

Detection and prevention strategies

Open-ended questions

Open-ended questions (Figure 1) are great for detecting satisficing, lack of attention, and most importantly, bots. Be on the look-out for nonsensical answers or answers consisting of random characters.

Figure 1. Open-endeded question example (Psychstudio, 2019)

Figure 1. Open-ended question example (Psychstudio, 2019)

Attention checks

Attention checks are designed to test a participant's understanding of the instructions they've been given at the beginning of a task. Attention checks can, at the very least, ensure that participants have not simply skipped reading the study instructions or otherwise have not satisficed whilst reading those instructions. Although not an absolute measure of cheating, failure to read instructions may provide reasonable suspicion and thus a closer look for anomalies in the participant's data.

For example, let us suppose that we provide some instructions before the beginning of a task and somewhere within those instructions is a condition that states:

Your task will be to answer 5 questions about fresh fruit purchases within the past year

We could then ask the following question later in the study:

Figure 2. Attention check question example (Psychstudio, 2019)

Figure 2. Attention check question example (Psychstudio, 2019)

Timing checks

If participants complete an individual question, task or study in an unreasonably quick time, then perhaps they've satisficed or provided random answers (Buchanan & Scofield, 2018; Downs, Holbrook, Sheng, & Cranor, 2010). Conversely, in a knowledge-based task, if they've taken longer than expected then they could be researching the answers rather than supplying responses from their current knowledge (Clifford & Jerrit, 2016). Thus, checking response times for outliers is a useful way to identify participants whose data should be examined in further detail before deciding on inclusion or exclusion.

Instructional manipulation checks

Instructional manipulation checks are quite similar to attention checks. With instructional manipulation checks, we instruct the participant exactly how to answer the question rather than provide a logical answer. Oppenheimer et al. (2009) showed that amongst students tested, 7% failed a simple instructional manipulation check. A variation of their blue-dot task can be seen in Figure 3.

Figure 3. Instructional manipulation check question example (Psychstudio, 2019)

Figure 3. Instructional manipulation check question example (Oppenheimer, Meyvis, & Davidenko, 2009; Psychstudio, 2019)

Oppenheimer et al. (2009) recommend that participants who initially fail an instructional manipulation check be given another chance at passing the test. These authors present evidence that this can turn a satisficing participant into a diligent participant.

Caution: Instructional manipulation checks may produce a demand characteristic effect exhibited as systematic thinking instead of natural reactions to stimuli. This effect can be produced because after an initial instructional manipulation check, some participants remain on guard with a higher likelihood of critically analyzing the task at hand. Hauser and Schwarz (2015) provided evidence for this effect using a cognitive reflection test (e.g., Frederick, 2005). The researchers showed that response times were longer and task scores higher when participants were exposed to instructional manipulation checks before a cognitive reflection test, compared to those who were exposed to instructional manipulation checks afterwards.

In sum, researchers need to consider whether reducing the incidence of satisficing is worth the increased chance of participants engaging in systematic thinking rather than natural reactions when giving their responses.

Consistency checks

Consistency checks (Jones, House, & Gao, 2015) can be an effective method for detection of bot activity and random responses, as well as for detecting fraudulent answers to eligibility criteria. Consistency checks look for consistency between answers on two or more related questions. For example:

What is your age (in years)?

and

What is your date of birth?

Inconsistency of responses to these checks means a higher likelihood of random responding, satisficing, or bot activity.

False questions

A false question is a question where no answer is true. This method can be used to detect bots, satisficing and random responding. These questions can be built in a variety of different ways, from open-ended text questions, to number questions, to multiple choice as long as any answer given as a response is patently false.

For example, a question about knowledge of fruit could be posed:

Figure 4. False question example (Psychstudio, 2019)

Figure 4. False question example (Psychstudio, 2019)

As none of these fruit actually exist, any answer given means the participant activity can be viewed as highly suspicious and worthy of a closer look.

Honeypot method

For detection of bot participation, the "honeypot" method can be very effective and is commonly used by blogs and forums to guard against bots posting spam messages. To take advantage of this method, you need to include a hidden form field in a survey or questionnaire. These form fields are invisible to humans but are seen by bots and therefore, if the field is not left empty, then it is likely that the participant in question is indeed a bot.

Cognitive tasks

To prevent bots from taking part in your study you can include a cognitive task as a prescreen requirement which does not allow progression to the study proper unless passed (Liu & Wronski, 2018). The most commonly used method on the web for implementing these tasks is reCAPTCHA.

Author's note: reCAPTCHA is a simple cognitive challenge (courtesy of Google). A successfully completed task provides a very strong indicator that the participant is human.
Google: reCAPTCHA: Easy on Humans, Hard on Bots
Wikipedia: reCAPTCHA.

Performance feedback

The risk of satisficing increases when conducting lengthy and/or complex tasks as participants become more likely to offload cognitive effort by satisficing. Distraction and inattention are also more likely to occur when performing these tasks online rather than in-lab (Finley & Penningroth, 2015). Providing the participant with feedback on their performance after each trial, or block of trials, or after a section of survey, etc., can increase engagement, thus reducing the incidence of satisficing.

Statistical induction

Statistical induction can be used to detect the presence of bot generated responses in large datasets. Dupuis, Meier, and Cuneo (2018) compared seven indices and ranked them based on validity (i.e., the extent to which simulated data were identified correctly), feasibility (i.e., how easily indices can be calculated) and potential specificities.

Based on effectiveness and ease of use, the researchers recommend the following three indices:

  • Response coherence: Correlative index that indicates whether responses to a questionnaire are clear and understandable. Highly accurate and robust, however not easy to calculate. At the time of writing, a package was under development in R so it's worthwhile checking for the existence of that R package when analysis time rolls around.
  • Mahalanobis distance: Outlier detection index that indicates the distance between one response set and the collection of all other response sets. This index is easy to calculate in R (Mahalanobis function in the stats package) and in SPSS.
  • Person–total correlation: Correlative index that measures the difference between the sum of scores on each item response and the mean score of the item responses across all other response sets. This index can be calculated in R and SPSS, as well as in a spreadsheet using correlation functions.

Dupuis et al. (2018) recommend researchers use both the Mahalanobis distance and person–total correlation indices for detection of bot activity in their online questionnaires because these indices are so easy to calculate. However, Downs et al. (2010) warn that when there are large numbers of random responses in the complete dataset, these methods can be non-optimal.


A cautionary note

The propensity to satisfice is correlated with certain personality constructs, and varies between individuals and cognitive loads experienced (Krosnick, Narayan, & Smith, 1996). Participants who satisfice can arguably be considered as legitimate samples and removing their data also removes a legitimate source of variability, and therefore produces problems with generalizability (Oppenheimer et al., 2009).


Conclusion

Combining a number of different strategies and techniques explained in this article can help reduce the occurrence of satisficing, and the incidence of random responses from both humans and bots. Typically most detection and prevention attempts by experiment and survey creators are made at the beginning of a study (i.e. prescreen checks). However, professional survey takers and experienced random response participants may well be nonnaive to various strategies explained in this article and also well aware that most often these checks are part of a prescreen check. It is common for experienced random responders to simply focus on the prescreening checks before providing random responses for the rest of the study. Downs et al. (2010) suggest that spreading these checks throughout a study can prevent such behavior.


References

  1. Buchanan, E., & Scofield, J. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50(6), 2586–2596. doi: 10.3758/s13428-018-1035-6
  2. Chandler, J., & Paolacci, G. (2017). Lie for a dime: When most prescreening responses are honest but most study participants are impostors. Social Psychological and Personality Science, 8(5), 500–508. doi: 10.1177/1948550617698203
  3. Clifford, S., & Jerit, J. (2014). Is there a cost to convenience? An experimental comparison of data quality in laboratory and online studies. Journal of Experimental Political Science, 1(2), 120–131. doi: 10.1017/xps.2014.5
  4. Credé, M. (2010). Random responding as a threat to the validity of effect size estimates in correlational research. Educational and Psychological Measurement, 70, 596–612. doi: 10.1177/0013164410366686
  5. Dennis, S., Goodson, B., & Pearson, C. (2019). Online Worker Fraud and Evolving Threats to the Integrity of MTurk Data: A Discussion of Virtual Private Servers and the Limitations of IP-Based Screening Procedures. SSRN Electronic Journal. doi: 10.2139/ssrn.3233954
  6. Downs, J., Holbrook, M., Sheng, S., & Cranor, L. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. In Proceedings of the 28th international conference on human factors in computing systems. 2399–2402. doi: 10.1145/1753326.1753688
  7. Dupuis, M., Meier, E., & Cuneo, F. (2018). Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices. Behavior Research Methods, 1–10. doi: 10.3758/s13428-018-1103-y
  8. Finley, A., & Penningroth, S. (2015). Online versus in-lab: Pros and cons of an online prospective memory experiment. In A. M.Columbus (Ed.), Advances in psychology research (pp. 135–161). New York, NY: Nova.
  9. Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19(4), 25–42. doi: 10.1257/089533005775196732
  10. Hauser, D., & Schwarz, N. (2015). It’s a trap! Instructional manipulation checks prompt systematic thinking on “tricky” tasks. SAGE Open, 5(2), 215824401558461. doi: 10.1177/2158244015584617
  11. Jones, M., House, L., & Gao, Z. (2015). Respondent screening and revealed preference axioms: Testing quarantining methods for enhanced data quality in Web panel surveys. Public Opinion Quarterly, 79(3), 687–709. doi: 10.1093/poq/nfv015
  12. Krosnick, J. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5, 213–236.
  13. Krosnick, J., Narayan, S., & Smith, W. (1996). Satisficing in surveys: Initial evidence. New Directions For Evaluation, 1996(70), 29-44. doi: 10.1002/ev.1033
  14. Liu, M., & Wronski, L. (2018). Trap questions in online surveys: Results from three web survey experiments. International Journal of Market Research, 60(1), 32–49. doi: 10.1177/1470785317744856
  15. Oppenheimer, D., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45(4), 867–872. doi: 10.1016/j.jesp.2009.03.009

Ready to start using the world's easiest online experiment builder?

Conduct simple psychology tests and surveys, or complex factorial experiments. Increase your sample size and automate your data collection with experiment software that does the programming for you.

Behavioral experiments. Superior stimulus design. No code.

Ben Howell
Ben Howell
Founder, Psychstudio