B2F: End-to-End Body-to-Face Motion Generation with Style Reference

Bokyung Jang, Eunho Jung, Yoonsang Lee
To appear in

Abstract

Human motion naturally integrates body movements and facial expressions, forming a unified perception [Hu et al. 2020]. If a virtual character’s facial expression remains static despite body movements, it may weaken the per- ception of the character as a cohesive whole. Motivated by this, we propose B2F, a model that generates facial motions aligned with body movements. B2F takes a facial style reference as input, generating facial animations that reflect the provided style while maintaining consistency with the associ- ated body motion. It outputs facial motions in the FLAME format [Li et al. 2017], making them directly applicable to SMPL-X characters [Pavlakos et al . 2019]. Moreover, with our proposed FLAME-to-ARKit converter module, B2F extends its compatibility to stylized characters with ARKit blendshapes. Our model is trained end-to-end, extracting the shared content information between body and facial motions through an alignment loss while effec- tively disentangling content and style information using consistency and cross-consistency losses. To evaluate B2F’s performance, we conducted a perceptual study and demonstrated its effectiveness through qualitative assessments, including its application to various animated characters. The results show that with B2F-generated facial motion, observers perceive the character as more realistic, engaging, and expressive compared to a static face.We present TouchWalker, a real-time system for controlling full-body avatar locomotion using finger-walking gestures on a touchscreen. The system comprises two main components: TouchWalker-MotionNet, a neural motion generator that synthesizes full-body avatar motion on a per-frame basis from temporally sparse two-finger input, and TouchWalker-UI, a compact touch interface that interprets user touch input to avatar-relative foot positions. Unlike prior systems that rely on symbolic gesture triggers or predefined motion sequences, TouchWalker uses its neural component to generate continuous, context-aware full-body motion on a per-frame basis—including airborne phases such as running, even without input during mid-air steps—enabling more expressive and immediate interaction. To ensure accurate alignment between finger contacts and avatar motion, it employs a MoE-GRU architecture with a dedicated foot-alignment loss. We evaluate TouchWalker in a user study comparing it to a virtual joystick baseline with predefined motion across diverse locomotion tasks. Results show that TouchWalker improves users’ sense of embodiment, enjoyment, and immersion.

Paper

Publisher: Coming soon
arXiv: Coming soon

Video