Does Active Learning Reduce Human Coding?: A Systematic Comparison of Neural Network with nCoder
Jaeyoon Choi, Andrew Ruis, Zhiqiang Cai, Brendan Eagan and David Williamson Shaffer
In quantitative ethnography (QE) studies which often involve large datasets that cannot be entirely hand-coded by human raters, researchers have used supervised machine learning approaches to develop automated classifiers. However, QE researchers are rightly concerned with the amount of human coding that may be required to develop classifiers that achieve the high levels of accuracy that QE studies typically require. In this study, we compare a neural network, a powerful traditional supervised learning approach, with nCoder, an active learning technique commonly used in QE studies, to determine which technique requires the least human coding to produce a sufficiently accurate classifier. To do this, we constructed multiple training sets from a large dataset used in prior QE studies and designed a Monte Carlo simulation to test the performance of the two techniques systematically. Our results show that nCoder can achieve high predictive accuracy with significantly less human-coded data than a neural network.
Learning through feedback: Understanding early-career teachers’ learning using online video platforms
This study examines the patterns of feedback among early-career elementary mathematics teachers participating in an online inquiry group focused on the practice of number talk routines. Number Talk Routines are instruction-al practices designed to help facilitate students’ computational fluency in ways that promote flexible number sense. Teachers facilitate these discus-sions using responsive teaching practices that elicit student thinking and highlight how students’ strategies relate to each other and to key mathematical concepts. Data for this study come from time-stamped feedback comments posted by members of the inquiry group to correspond with specific moments during each participant’s number talk routine. Epistemic network analysis was used to examine the patterns in the form and content of feed-back over time. The results suggest that early-career teachers became more reflective in their feedback, connecting their own practices to the work of others and focusing more on teachers’ decision making that supported enactment of responsive teaching practices.
Automated Code Extraction from Discussion Board Text Dataset
Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, James Folkestad and Marcia Moraes
This study investigates the capabilities of three different text mining approaches: Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, to automate code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes.
What makes a good answer? Analyzing the Content Structure of Stack Overflow Answers
Luis Morales-Navarro and Amanda Barany
As the need to learn computer programming increases, Stack Overflow provides a popular and practical community for software developers to ask and answer questions related to coding. Answers to questions are ranked by users to evaluate their quality. For newcomers participating in answering questions, this process can be challenging, as they have to learn what the expectations for answers are in this online community. In this paper, we analyzed the content structure of the answers posted to Stack Overflow’s most highly ranked question with the goal of understanding characteristics of answers valued by the Stack Overflow community. Using epistemic network analysis, we analyzed answers to the question “Why is processing a sorted array faster than processing an unsorted array?” Network models showed that answer content was qualitatively different between high and low ranked answers, with high ranked answers including general explanations and code examples to contextualize question-specific code and explanations. We discuss how these findings could be used to better support and scaffold novices in crafting their answers.
Community At a Distance: Understanding Student Interactions in Course-Based Online Discussion Forums
Jennifer Scianna, Monique Woodard, Beatriz Galarza, Seiyon Lee, Rogers Kaliisa and Hazel Vega Quesada
Online discussion forums are often used as a point of contact between students and their instructors for college courses. While asynchronous discourse has proven to be effective for learning, it remains unclear whether the student interactions manifest in socially constructive ways in addition to the cognitive benefits. In this paper, we consider the social dimension of student interactions within a Canvas course discussion forum. In particular, we ex-amine the influence of instructional contexts to shape the mapping of different indicators that constitute social presence within the Community of Inquiry framework. For the analysis, data was collected from two instances of the same course: one taught a hybrid format and the other in a remote for-mat. The results of the epistemic network analysis reveal that elements of social presence manifest differently in hybrid and fully remote modalities. The remote modality yielded more interconnected, balanced networks than their hybrid counterparts. The findings suggest that discourse from online discussions is conducive to collaborative inquiry through the mediation of social presence when pedagogical decisions work with the different instruction modalities to support student-to-student interaction.
Ordered Network Analysis
Yuanru Tan, Andrew Ruis, Cody Marquart, Zhiqiang Cai, Mariah Knowles and David Williamson Shaffer
Collaborative Problem Solving (CPS) is a socio-cognitive process that is interactive, interdependent, and temporal. As individuals interact with each other, information is added to the common ground, or the current state of a group’s shared understanding, which in turn influences individuals’ subsequent responses to the common ground. Therefore, to model CPS processes, especially in a context where the order of events is hypothesized to be meaningful, it is important to account for the ordered aspect. In this study, we present Ordered Network Analysis (ONA), a method that can not only model the ordered aspect of CPS, but also supports visual and statistical comparison of ONA networks. To demonstrate the analytical affordances and interpretable visualizations of ONA, we analyzed the collaborative discourse data of air defense warfare teams. We found that ONA was able to capture the qualitative differences between the control and experimental condition that cannot be captured using unordered models, and also tested that such differences were statistically different.
Modeling Collaborative Discourse with ENA using a Probabilistic Function
Yeyu Wang, Andrew Ruis and David Williamson Shaffer
Models of collaborative learning need to account for interdependence, the ways in which collaborating individuals construct shared understanding by making connections to one another’s contributions to the collaborative discourse. To operationalize these connections, researchers have proposed two approaches: (1) counting connections based on the presence or absence of events within a temporal window of fixed length, and (2) weighting con-nections using the probability of one event referring to another. Although most QE researchers use fixed-length windows to model collaborative in-terdependence, this may result in miscounting connections due to the vari-ability of the appropriate relational context for a given event. To address this issue, we compared epistemic network analysis (ENA) models using both a window function (ENA-W) and a probabilistic function (ENA-P) to model collaborative discourse in an educational simulation of engineering design practice. We conducted a pilot study to compare ENA-W and ENA-P based on (1) interpretive alignment, (2) goodness of fit, and (3) explanato-ry power, and found that while ENA-P performs slightly better than ENA-W, both ENA-W and ENA-P are feasible approaches for modeling collabo-rative learning.
LSTM Neural Network Assisted Regex Development for Qualitative Coding
Zhiqiang Cai, Brendan Eagan, Cody Marquart and David Williamson Shaffer
Regular expression (regex) based automated qualitative coding helps reduce researchers' effort in manually coding text data, without sacrificing transparency of the coding process. However, researchers using regex based approaches struggle with low recall or high false negative rate during classifier development. Advanced natural language processing techniques, such as topic modeling, latent semantic analysis and neural network classification models help solve this problem in various ways. The latest advance in this direction is the discovery of the so called "negative reversion set (NRS)", in which false negative items appear more frequently than in the negative set. This helps regex classifier developers more quickly identify missing items and thus improve classification recall. This paper simulates the use of NRS in real coding scenarios and compares the required manual coding items between NRS sampling and random sampling in the process of classifier refinement. The result using one data set with 50,818 items and six associated qualitative codes shows that, on average, using NRS sampling, the required manual coding size could be reduced by 50% to 63%, comparing with random sampling.
Leveraging Epistemic Network Analysis to Discern the Development of Shared Understanding between Physicians and Nurses
Vitaliy Popov, Raeleen Sobetski, Taylor Jones, Luke Granberg, Kiara Turvey and Milisa Manojlovich
In healthcare settings, poor communication between physicians and nurses is one of the most common causes of adverse events. This study used Epistemic Network Analysis to help identify communication patterns in physician-nurse dyad interactions. We used existing video data where physicians made patient care rounds on two oncology patient units at a large academic medical center, and video recordings captured conversations physicians had with nurses on the plan of care. All data was transcribed, segmented and annotated using the Verbal Response Mode (VRM) taxonomy. The results showed that the relationship between Edification and Disclosure was strongest for the dyads that reached a shared understanding, suggesting the importance of these two modes to reaching shared understanding during patient care rounds. Reflection and Interpretation were the least used VRM codes, and this might be one possible area for intervention development. This pilot study provided new insight into how to improve communication between physicians and nurses using ENA coupled with VRM taxonomy.
Is QE Just ENA?
David Williamson Shaffer and Andrew Ruis
In the emerging field of quantitative ethnography (QE), epistemic network analysis (ENA) has featured prominently, to the point where multiple scholars in the QE community have asked some variation on the question: Is QE just ENA? This paper is an attempt to address this question systematically. We review arguments that QE should be considered a background and justification for using ENA as well as arguments that ENA should be considered merely one approach to implementing QE ideas. We conclude that ENA is used in QE, but not exclusively; and that QE uses ENA, but not exclusively; but that the answer to this question is less important than the reflexive thinking about methodology that has been a key focus of the QE community. Our hope is that, rather than a definitive answer to this question, this paper provides some ways to think about the relationships between theory, methods, and analytic techniques as the QE community continues to grow.
The Role of Data Simulation in Quantitative Ethnography
Zachari Swiecki and Brendan Eagan
Data simulations are powerful analytic tools that give researchers a great degree of control over data collection and experimental design. Despite these advantages, data simulations have not yet received the same amount of use as other quantitative ethnographic techniques. In this paper, we explore the reasons for this and use examples of recent work to argue that data simulations can---and already do---play an important role in quantitative ethnography.