Language-Guided Multimodal Texture Authoring

February 08, 2026

Paper cover

Abstract

Authoring realistic haptic textures often requires low-level parameter tuning and repeated trial-and-error, which slows iteration and limits creative exploration. This work introduces a language-driven authoring system that turns natural-language prompts into multimodal textures: coordinated haptic signals (sliding vibrations generated by force/speed-conditioned autoregressive models and tapping transients) and a text-prompted visual preview from a diffusion model. A shared, language-aligned latent space links modalities so a single prompt yields semantically consistent haptic and visual output; designers can specify goals like “gritty but cushioned” or “smooth hard metal” and immediately see and feel the result through a 3D haptic device. An anchor-referenced, attribute-wise evaluation for roughness, slipperiness, and hardness shows perceptually meaningful structure in the latent, with trends reflecting asperity, compliance, and surface-film effects. A human-subject study further indicates coherent cross-modal experience and low effort for prompt-based iteration. Overall, the results show language as a practical control modality for texture authoring, enabling prompt-first, designer-oriented workflows that replace manual parameter tuning with interpretable, text-guided refinement.

Contributions

Language-driven multimodal texture authoring that unifies haptic signals and visual previews from a single prompt.
A shared, language-aligned latent space that preserves semantic consistency across haptic and visual channels.
User studies and attribute-wise evaluations demonstrating perceptually meaningful control over roughness, hardness, and slipperiness.