Principles of Scientific Inquiry into the Study of Language Acquisition and the Language Sciences
Primary research in the area of language and the language sciences must be held to the same rigorous scientific principles as any other science. In addition it faces unique challenges.
Our digital age brings both new possibilities and new challenges for developing research in this area. Cyberinfrastructure developments can now be brought to bear on primary and secondary research in the field of language acquisition and the related language sciences.
As research in this area achieves new power through strengthened collaborations, the scientific quality and validity of primary research becomes more important than ever. Shared scientific principles must underlie these collaborations and these new research developments.
We now must cultivate not only new research infrastructures, but new learning infrastructure as well, ones which are distributed and can span both time and space crossing not only cross-lab, but cross-nation and cross-language boundaries.
Interdisciplinary research teams are necessary in order to pursue the most significant questions in the study of language acquisition today. Our digital age now enables this in the form of ‘virtual communities’, raising new promises and new challenges.
Collaboration within and across disciplines becomes paramount in this new culture. Virtual communities must establish principles for collaboration involving shared materials, data and analyses which both expand the possibilities of any individual research and protect them and their work.
In the typical model for empirical work in language acquisition, the people who collect and code the data are the same people who generate the hypotheses, design the experiment, and later analyze the data and present results to the wider community. The Virtual Center seeks to put data collected for a given study to new and further use—for studies that build on original work in some new way. For new work that makes use of old data to be sound and successful, a key principle to keep in mind is that those who conduct the new study are, de facto, collaborating with those who conducted the original study.
Researchers conducting empirical work in language acquisition are often familiar with the sensation of “knowing the data.” The growth of a research study is often organic: the data collected does in fact confirm or fail to confirm a hypothesis, but the researchers conducting the study are aware of a deeper understanding or knowledge of the data that frequently leads to new ideas for further work, reveals strengths or shortcomings of the method given the study at hand, and provides a grounding for further discussion or development of the results.
The importance of “knowing the data” is so great that when a group of researchers collaborates on data collection, weekly meetings are often necessary, with frequent communications between meetings, to ensure consistency and to keep everyone fully aware of what is happening during data collection. It is even common for members of such groups to feel they have a better understanding of the parts of the data set they themselves collected or coded than they do of the parts that others collected and coded.
“Knowing the data” involves a range of observation that extends even beyond what is formally recorded. Examples include knowledge of the circumstances under which data were collected, knowledge of the subjects and of what a particular interactive session of data collection was like, knowledge of the particular way coding decisions were reached, and knowledge of which features of the observable material were deemed important for coding.
None of this knowledge constitutes grounds for scientific conclusions about the data, but it frequently plays an important role as the analysis proceeds and as questions are asked. A principle for the successful use of the Virtual Center is to enable the transfer of knowledge of the data to the largest degree possible, and to protect the validity of scientific discoveries against the possibility that people drawing the conclusions do not have access to a key piece of information about the way the data were gathered or coded.
[As a hypothetical example, in a study where the data involve production of speech, transcribers using something other than the IPA system may transcribe two subtly different pronunciations of verb endings in the same way. The difference may be irrelevant if the hypotheses and conclusions of the original study did not deal with verbal inflection, but a researcher using the data for a different purpose might draw a conclusion that was in reality unsubstantiated by the data if that researcher did not prepare a new transcription of the original audio data.]
There is an obvious ethical application of these principles:
Collaborators must be acknowledged, and contributors of data to the Virtual Center should be able to specify the specific publications that later work on the same data should cite. They should also be able to specify other acknowledgments that should be made in certain circumstances. (For example, some researchers may deem that data sources should be acknowledged not only indirectly through citation of an earlier study, but directly by name in each publication that results from the data.)
There is also a longer list of scientific applications of these principles. Examples include:
- Researchers should become familiar with all documentation and information about VC data before using the data, even if the proposed new study does not directly relate to some of the documentation and data.
- The Virtual Center should enable ongoing discussion of ways to capture formally the various kinds of knowledge that the researchers who collect and code data have about their work.
- When a new analysis of data reveals conclusions that are at odds in some way with the original work, the new work should discuss the original research goals and conclusions in some detail, and perhaps even seek opportunities to discuss the new analysis with the original researchers. [The possibility at issue here is distinct from a failure to replicate original results under the same conditions: the new work probably tests different hypotheses or addresses the data in a new way.]