Publications

LongEval: Longitudinal Evaluation of Model Performance at CLEF 2023

Published in Springer Nature Switzerland, 2023

In this paper, we describe the plans for the first LongEval CLEF 2023 shared task dedicated to evaluating the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. The task is motivated by recent research showing that the performance of these models drops as the test data becomes more distant, with respect to time, from the training data. LongEval differs from traditional shared IR and classification tasks by giving special consideration to evaluating models aiming to mitigate performance drop over time. We envisage that this task will draw attention from the IR community and NLP researchers to the problem of temporal persistence of models, what enables or prevents it, potential solutions and their limitations.

Recommended citation: Alkhalifa, R., Bilal, I., Borkakoty, H., Camacho-Collados, J., Deveaud, R., El-Ebshihy, A., ... & Zubiaga, A. (2023, March). LongEval: Longitudinal Evaluation of Model Performance at CLEF 2023. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III (pp. 499-505). Cham: Springer Nature Switzerland. https://link.springer.com/chapter/10.1007/978-3-031-28241-6_58

Building for tomorrow: Assessing the temporal persistence of text classifiers

Published in Information Processing & Management , 2022

Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model’s ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time.

Recommended citation: Alkhalifa, R., Kochkina & Zubiaga, A. (2022). Building for tomorrow: Assessing the temporal persistence of text classifiers Information Processing & Management. https://www.sciencedirect.com/science/article/pii/S0306457322003016

Capturing stance dynamics in social media: open challenges and research directions

Published in International Journal of Digital Humanities ., 2022

Social media platforms provide a goldmine for mining public opinion on issues of wide societal interest and impact. Opinion mining is a problem that can be operationalised by capturing and aggregating the stance of individual social media posts as supporting, opposing or being neutral towards the issue at hand. While most prior work in stance detection has investigated datasets that cover short periods of time, interest in investigating longitudinal datasets has recently increased. Evolving dynamics in linguistic and behavioural patterns observed in new data require adapting stance detection systems to deal with the changes. In this survey paper, we investigate the intersection between computational linguistics and the temporal evolution of human communication in digital media. We perform a critical review of emerging research considering dynamics, exploring different semantic and pragmatic factors that impact linguistic data in general, and stance in particular. We further discuss current directions in capturing stance dynamics in social media. We discuss the challenges encountered when dealing with stance dynamics, identify open challenges and discuss future directions in three key dimensions: utterance, context and influence.

Recommended citation: Alkhalifa, R., Zubiaga, A. Capturing stance dynamics in social media: open challenges and research directions. Int J Digit Humanities 3, 115–135 (2022). https://doi.org/10.1007/s42803-022-00043-w https://doi.org/10.1007/s42803-022-00043-w

Opinions are Made to be Changed: Temporally Adaptive Stance Classification

Published in OASIS 21: Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks, 2021

Given the rapidly evolving nature of social media and people’s views, word usage changes over time. Consequently, the performance of a classifier trained on old textual data can drop dramatically when tested on newer data. While research in stance classification has advanced in recent years, no effort has been invested in making these classifiers have persistent performance over time. To study this phenomenon we introduce two novel large-scale, longitudinal stance datasets. We then evaluate the performance persistence of stance classifiers over time and demonstrate how it decays as the temporal gap between training and testing data increases. We propose a novel approach to mitigate this performance drop, which is based on temporal adaptation of the word embeddings used for training the stance classifier. This enables us to make use of readily available unlabelled data from the current time period instead of expensive annotation efforts. We propose and compare several approaches to embedding adaptation and find that the Incremental Temporal Alignment (ITA) model leads to the best results in reducing performance drop over time.

Recommended citation: Rabab Alkhalifa, Elena Kochkina, and Arkaitz Zubiaga. 2021. Opinions are Made to be Changed: Temporally Adaptive Stance Classification. In Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks (OASIS 21). Association for Computing Machinery, New York, NY, USA, 27–32. https://doi.org/10.1145/3472720.3483620 https://dl.acm.org/doi/10.1145/3472720.3483620

QMUL-SDS @ DIACR-ITA: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian

Published in EVALITA Evaluation of NLP and Speech Tools for Italian. Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop., 2020

In this paper, we present the results and main findings of our system for the DIACR-Ita 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word’s vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effective compared to different approaches including Logistic Regression and a Feed Forward Neural Network using accuracy. Our model ranked 3rd with an accuracy of 83.3%.

Recommended citation: Alkhalifa, R., Tsakalidis, A., Zubiaga, A., & Liakata, M. (2020). QMUL-SDS@ DIACR-Ita: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian. Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020), Online. CEUR. org. https://www.aaccademia.it/scheda-libro?aaref=1423

QMUL-SDS @ SardiStance: Leveraging Network Interactions to Boost Performance on Stance Detection using Knowledge Graphs

Published in EVALITA Evaluation of NLP and Speech Tools for Italian. Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop., 2020

This paper presents our submission to the SardiStance 2020 shared task, describing the architecture used for Task A and Task B. While our submission for Task A did not exceed the baseline, retraining our model using all the training tweets, showed promising results leading to (f-avg 0.601) using bidirectional LSTM with BERT multilingual embedding for Task A. For our submission for Task B, we ranked 6th (f-avg 0.709). With further investigation, our best experimented settings increased performance from (f-avg 0.573) to (f-avg 0.733) with same architecture and parameter settings and after only incorporating social interaction features – highlighting the impact of social interaction on the model’s performance.

Recommended citation: Alkhalifa, R., & Zubiaga, A. (2020). QMUL-SDS@ SardiStance: Leveraging Network Interactions to Boost Performance on Stance Detection using Knowledge Graphs (short paper). In EVALITA. https://www.aaccademia.it/scheda-libro?aaref=1423

QMUL-SDS at CheckThat! 2020: determining COVID-19 tweet check-worthiness using an enhanced CT-BERT with numeric expressions

Published in Working Notes of CLEF 2020–Conference and Labs of the Evaluation Forum (2020), 2020

This paper describes the participation of the QMUL-SDS team for Task 1 of the CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the check-worthiness of tweets about COVID-19 to identify and prioritise tweets that need fact-checking. The overarching aim is to further support ongoing efforts to protect the public from fake news and help people find reliable information. We describe and analyse the results of our submissions. We show that a CNN using COVID-Twitter-BERT (CT-BERT) enhanced with numeric expressions can effectively boost performance from baseline results. We also show results of training data augmentation with rumours on other topics. Our best system ranked fourth in the task with encouraging outcomes showing potential for improved results in the future.

Recommended citation: Alkhalifa, R., Yoong, T., Kochkina, E., Zubiaga, A., & Liakata, M. (2020). QMUL-SDS at CheckThat! 2020: determining COVID-19 tweet check-worthiness using an enhanced CT-BERT with numeric expressions. CLEF(2020). https://link.springer.com/chapter/10.1007/978-3-030-58219-7_17