FedSTaRL: Leveraging Federated Learning for Decentralized Semi-Supervised Task-Informed Representation Learning on Sequential Data

Abstract

Over the past decade, advancements in machine learning, particularly large language models like OpenAI’s Chat- GPT, have heightened public awareness about artificial intelligence. The growing awareness has led to increased expectations for intelligent products that enhance user experience, often requiring the collection of substantial amounts of potentially sensitive data. However, data protection legislation such as the EU AI Act [1] or GDPR [2] restricts the collection and storage of sensitive user data in real-world applications, making it difficult for organizations to aggregate large datasets across users or institutions, in addition to substantial transmission costs associated with storing the data in the first place. Federated Learning offers a promising paradigm to address these issues by enabling decentralized learning via data minimization, thereby also re- ducing communication overhead. To evaluate its effectiveness, we explore a real-world sequential classification task to learn the intent of smart device users within confined areas of a vehicle. Utilizing an anonymized real-world dataset sourced from BMW’s car fleet and employing a novel sequential modeling approach, STaRFormer [3], we analyze two federated learning architectures, cross-device and cross-silo, and assess two federated aggregation algorithms for various baseline models. The results demonstrate that STaRFormer outperforms baseline models in the cross-silo case and delivers competitive results in the cross-device case, highlighting its efficacy in decentralized sequential data modeling.

Motivation

The motivation for this project is rooted in the pursuit of data-driven solutions, necessitated by the abundant amounts of data generated by modern vehicles, which can reach gigabytes per hour. Mandated by General Data Protection Regulation (GDPR) [2], the European Union Artificial Intelligence Act [1] and other legistlation the data collection is often contingent upon obtaining customer consent. Moreover, the financial implications related to data storage and transmission are considerable, posing additional challenges. In the use-case presented, data availability further complicates the landscape, as merely around 30% of customers provide consent for the use of their data in developmental endeavors. This limitation prompts the need for strategies to effectively leverage potentially unavailable data. Additionally, the diversity in regulations across various countries adds layers of complexity to forming a cohesive data collection and storage strategy. These challenges collectively highlight the necessity for innovative approaches to enhance data utilization in vehicle technology development.

Technical TL;DR

This project shows that local contrastive learning can improve Transformer model performance in federated learning for sequential tasks, but requires sufficient local data and careful tuning to avoid overfitting. Future work might explore methods to better balance global and local representation learning in distributed settings.

Results

Results (T=50)

Cross-Device setting presents a challenging task

due to limited number of training samples
non-i.i.d. characteristics

Lower performance metrics across all models (cross-device)
Proximal term in FedProx helps stabilize training, when client dataset sizes are larger

		Statistics
Setting	Evaluation Criterion	Accuracy	F_0.5-Score	F₁-Score	Precision	Recall	Avg. Rank (Accuracy)
Cross-Device	RNN	0.720	0.706	0.695	0.726	0.702	4.375
	LSTM	0.701	0.624	0.631	0.624	0.664	3.375
	GRU	0.736	0.724	0.719	0.647	0.720	2.375
	Transformer	0.743	0.729	0.717	0.743	0.718	2.500
	STaRFormer	0.740	0.726	0.714	0.741	0.711	2.375
	FedAvg vs FedProx	5 / 5	5 / 5	5 / 5	2 / 5	4 / 5	-
	E=1	0.722	0.680	0.675	0.658	0.686	-
	E=5	0.734	0.724	0.715	0.735	0.719	-
Cross-Silo	RNN	0.746	0.737	0.736	0.741	0.740	4.250
	LSTM	0.748	0.738	0.736	0.744	0.739	4.000
	GRU	0.743	0.732	0.730	0.734	0.732	3.250
	Transformer	0.758	0.746	0.741	0.751	0.743	3.250
	STaRFormer	0.770	0.758	0.754	0.762	0.752	1.250
	FedAvg vs FedProx	2 / 5	2 / 5	2 / 5	2 / 5	2 / 5	-
	E=1	0.754	0.743	0.741	0.745	0.741	-
	E=5	0.752	0.752	0.738	0.748	0.741	-

Impact of distributed local contrastive learning.

Cross-Device:

Restricts effectiveness of CL.
Excessive weight on CL loss --> noisy, misaligned representations.

Cross-Silo:

Fewer clients with larger, more variable datasets allow for meaningful positive pairings.
Robust representations positively impact the global objective.

Settings	Aggregation	λ_CL	Accuracy	F_0.5-Score	F₁-Score	Precision	Recall
Cross-Device	FedAvg	0.1	0.743	0.723	0.722	0.735	0.718
	FedAvg	1.0	0.728	0.702	0.687	0.718	0.683
	FedProx	0.1	0.743	0.729	0.722	0.735	0.718
	FedProx	1.0	0.721	0.701	0.687	0.718	0.683
Cross-Silo	FedAvg	0.1	0.763	0.759	0.750	0.752	0.749
	FedAvg	1.0	0.784	0.773	0.770	0.776	0.762
	FedProx	0.1	0.744	0.722	0.729	0.735	0.719
	FedProx	1.0	0.752	0.738	0.729	0.748	0.724

Results (T=200 and E=1)

More extensive and structed client datasets (cross-silo) lead to higher performance, otherwise effectiveness of local CL is restricted.
FedProx algorithm promotes consistent convergence across both cross-silo and cross-device scenarios.
The distributed semi-supervised CL component accelerates convergence, especially at the beginning of training.
RNN-based Methods take longer to converge
Overfitting:

The distributed semi-supervised CL makes at transformer model more prone to overfitting than vanilla Transformers.
FedProx prevents overfitting in cross-silo setting.
Overfitting is less problematic in cross-device setting.

References

[1] C. of European Union, “Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (Artificial Intelligence Act),” Jun. 2024, legislative Body: CONSIL, EP.

[2] C. of European Union, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation),” 2016, doc ID: 32016R0679 Doc Sector: 3 Doc Type: R.

[3] M. Forstenhäusler, D. Külzer, C. Anagnostopoulos, S. P. Parambath, and N. Weber, “STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data,” Apr. 2025, arXiv:2504.10097 [cs]. [Online]. Available: http://arxiv.org/abs/2504.10097

Leveraging Federated Learning for Decentralized Semi-Supervised Task-Informed Representation Learning on Sequential Data

IEEE ICDCSW 2025

🏆 Best Runner Up Presentation Award

Central validation accuracy versus communication rounds (T = 200).