Text2CAD: Revolutionizing CAD Design with Text Prompts

BY Mark Howell 29 September 20243 MINS READ
article cover

Text2CAD introduces a novel data annotation pipeline that leverages open-source LLMs (Large Language Models) and VLMs (Vision-Language Models) to annotate the DeepCAD dataset with text prompts containing varying levels of complexities and parametric details. This pipeline is divided into two stages:

  1. Shape Description Generation: Using VLM (LlaVA-NeXT), the system generates a basic shape description.

  2. Multi-Level Textual Annotation Generation: Utilizing LLM (Mixtral-50B), the system creates detailed parametric instructions.
    For more information on how to effectively manage and annotate data, check out this guide to sharing media files.

Text2CAD Transformer

The Text2CAD Transformer is an end-to-end Transformer-based autoregressive architecture designed to generate CAD design history from input text prompts. The model operates as follows:

  • Input: A text prompt \(T\) and a CAD subsequence \(\mathbf{C}_{1:t-1}\) of length \({t-1}\).

  • Text Embedding: The text embedding \(T_{adapt}\) is extracted from \(T\) using a pretrained BeRT Encoder followed by a trainable Adaptive layer.

  • CAD Sequence Embedding: The resulting embedding \(T_{adapt}\) and the CAD sequence embedding \(F^0_{t-1}\) are passed through \(\mathbf{L}\) decoder blocks to generate the full CAD sequence in an autoregressive manner.
    For those interested in exploring more about task automation, read how to automate tasks and workflows.

Visual Results

Visual examples demonstrate the effectiveness of Text2CAD in generating 3D CAD models from varied prompts. For instance:

  • Ring-like Model: Three different prompts yield the same ring-like model, some without explicitly mentioning 'ring'.

Star-shaped Model: Three diverse prompts result in the same star-shaped model, each emphasizing different star characteristics.

Quantitative Results

The performance of Text2CAD was evaluated using two strategies:

  1. CAD Sequence Evaluation: This assesses the parametric correspondence between the generated CAD sequences and the input texts using the following metrics:
    - F1 Scores: For Line, Arc, Circle, and Extrusion using the method proposed in CAD-SIGNet.
    - Chamfer Distance (CD): Measures geometric alignment between the ground truth and reconstructed CAD models of Text2CAD and DeepCAD.
    - Invalidity Ratio (IR): Measures the invalidity of the reconstructed CAD models.

Visual Inspection: The performance of Text2CAD and DeepCAD was compared with GPT-4 and human evaluation.

Video Acknowledgement

This work was partially supported by the EU Horizon Europe Framework under grant agreement 101135724 (LUMINOUS).


Remember these 3 key ideas for your startup:

  • Leverage AI for Efficiency: Text2CAD demonstrates how AI can streamline the design process by converting text prompts into detailed CAD models. This can significantly reduce the time and effort required for product design, allowing startups to focus on innovation and market entry.

  • Adopt Advanced Data Annotation Techniques: The novel data annotation pipeline used by Text2CAD leverages both LLMs and VLMs to generate multi-level text prompts. Startups can adopt similar techniques to enhance their data processing capabilities, leading to more accurate and efficient outcomes.

  • Utilize Transformer Architectures: The Text2CAD Transformer showcases the power of Transformer-based architectures in generating complex outputs from simple inputs. Startups can explore Transformer models to automate and improve various aspects of their operations, from customer service to product development.


Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion. For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now