How to improve Validity in a survey instrument

Todd P Chang
Connect

Todd P Chang

DIvisional Director for Research & Scholarship, Associate Fellowship Director, Division of Emergency Medicine at Children's Hospital Los Angeles
Todd is the Director of Technology for the INSPIRE Network with David Kessler & Marc Auerbach. Todd was formerly the Faculty Advisor and Head Site Administrator for the website's predecessor, pemfellows.com. He lives in Pasadena, CA, with his partner and 2 ridiculous cats that are probably sitting on his head as you read this.
Todd P Chang
Connect
Todd P Chang

Ugh, I hate that word, Validity.  Complex enough to warrant a PhD level thesis, and common enough to make your survey-based studies challenging.  Here’s a quick snapshot based on Dr. David Cook et al.’s article Current concepts in validity and reliability for psychometric instruments: theory and application .  Let’s first summarize Validity and define it.  Validity refers to the level to which a particular instrument’s data and conclusions actually measures what it is designed and purported to do.  In other words, an instrument that measures raw intelligence is ‘valid’ if it actually does just that in a given population.  A few caveats:

1. Validity is contextual.  Our modern science has allowed at-home pregnancy tests a very consistent and proven ability to validly predict pregnancy in women of child-bearing age.  We may double-check it every once in a while, but we generally trust the pregnancy test when that happens.  However, if you change the context, the validity will be threatened.  For example, does a positive pregnancy test (using the exact same test) in a 4-year-old girl allow for a valid conclusion that she is pregnant?  How about for a 22-year-old man?  Clinically, we know they are at risk of a germ cell tumor.  The point, though, is that validity is not a feature just of the instrument; it’s not like sensitivity / specificity.  It relies on context, and simply snatching an instrument shown to be valid in one context to another, does not guarantee validity.  A test used to look at how well post-graduate fellows are doing may not be a valid test for medical students.

2. Validity isn’t an all-or-nothing proposition.  It is, perhaps, conceptualized as a spectrum, just like every other thing in medicine these days.  This means an instrument or survey isn’t just yes-valid or no-not-valid.  There are incremental procedures to improve and cultivate a higher level of validity for the context that you want to use the instrument.  Let’s discover those procedures.

According to Cook & Beckman’s article, there are 5 salient features that improve Validity.

1. CONTENT: Do instrument items completely represent the construct?  The construct is exactly what you’re trying to measure.  An attitude towards infant LPs?  The skill of a code captain’s communication skills?  A predictive score for an appendicitis?  To improve this content validity, make sure multiple experts weigh in on your instrument.

2. RESPONSE PROCESS: The relationship between the intended construct and the thought processes of subjects or observers.  This means that the intended users of the instrument needs to test drive your instrument, to make sure what they’re thinking and interpreting is what you want them to.  Small things like vaguely or broadly asked questions or items really open to interpretation will ruin Response Process.  This means pilot, pilot, and pilot the instrument with the goal of improving its clarity.

3. INTERNAL STRUCTURE: Acceptable reliability and factor structure.  This is fairly easy to conceptualize.  Your instrument should have consistent results – whether among different observers, the same person over time, or multiple people separated geographically.

4. RELATIONS TO OTHER VARIABLES: Correlation with scores from another instrument assessing the same construct.  This is fairly common in educational aptitude instruments or attitudes surveys, particularly when a full demographics information is also sought.  For example, a test on clinical skills in residents should ideally have significant, incremental score improvements as they go up in post-graduate year:  you’re comparing the ability of the test score to tease out high vs. low performers against a naturally occurring variable of experience.  If so, Yahtzee!  If not, something is wrong with your assessment.  Or even your residents!

5. CONSEQUENCES: Do scores really make a difference?  This may be a hard one to prove, but consequences of licensing, employment, certifications, etc. can also add to the validity.  An effect on patient outcomes, for example, would be huge.

The next time you want to do a survey-based study, think of these 5 elements, and make provisions to improve as many facets of Validity as possible, and use the Methods section on your abstract or manuscript to vouch for your steps.  It makes your instrument stronger, your data better, and your conclusions more valid.  Good luck!

Todd P Chang

Todd P Chang

Todd is the Director of Technology for the INSPIRE Network with David Kessler & Marc Auerbach. Todd was formerly the Faculty Advisor and Head Site Administrator for the website's predecessor, pemfellows.com. He lives in Pasadena, CA, with his partner and 2 ridiculous cats that are probably sitting on his head as you read this.