Text2CAD: Revolutionizing CAD Design with Text Prompts

BY Mark Howell 29 September 20243 MINS READ
article cover

Text2CAD introduces a novel data annotation pipeline that leverages open-source LLMs (Large Language Models) and VLMs (Vision-Language Models) to annotate the DeepCAD dataset with text prompts containing varying levels of complexities and parametric details. This pipeline is divided into two stages:

  1. Shape Description Generation: Using VLM (LlaVA-NeXT), the system generates a basic shape description.

  2. Multi-Level Textual Annotation Generation: Utilizing LLM (Mixtral-50B), the system creates detailed parametric instructions.
    For more information on how to effectively manage and annotate data, check out this guide to sharing media files.

Text2CAD Transformer

The Text2CAD Transformer is an end-to-end Transformer-based autoregressive architecture designed to generate CAD design history from input text prompts. The model operates as follows:

  • Input: A text prompt \(T\) and a CAD subsequence \(\mathbf{C}_{1:t-1}\) of length \({t-1}\).

  • Text Embedding: The text embedding \(T_{adapt}\) is extracted from \(T\) using a pretrained BeRT Encoder followed by a trainable Adaptive layer.

  • CAD Sequence Embedding: The resulting embedding \(T_{adapt}\) and the CAD sequence embedding \(F^0_{t-1}\) are passed through \(\mathbf{L}\) decoder blocks to generate the full CAD sequence in an autoregressive manner.
    For those interested in exploring more about task automation, read how to automate tasks and workflows.

Visual Results

Visual examples demonstrate the effectiveness of Text2CAD in generating 3D CAD models from varied prompts. For instance:

  • Ring-like Model: Three different prompts yield the same ring-like model, some without explicitly mentioning 'ring'.

Star-shaped Model: Three diverse prompts result in the same star-shaped model, each emphasizing different star characteristics.

Quantitative Results

The performance of Text2CAD was evaluated using two strategies:

  1. CAD Sequence Evaluation: This assesses the parametric correspondence between the generated CAD sequences and the input texts using the following metrics:
    - F1 Scores: For Line, Arc, Circle, and Extrusion using the method proposed in CAD-SIGNet.
    - Chamfer Distance (CD): Measures geometric alignment between the ground truth and reconstructed CAD models of Text2CAD and DeepCAD.
    - Invalidity Ratio (IR): Measures the invalidity of the reconstructed CAD models.

Visual Inspection: The performance of Text2CAD and DeepCAD was compared with GPT-4 and human evaluation.

Video Acknowledgement

This work was partially supported by the EU Horizon Europe Framework under grant agreement 101135724 (LUMINOUS).


Remember these 3 key ideas for your startup:

  • Leverage AI for Efficiency: Text2CAD demonstrates how AI can streamline the design process by converting text prompts into detailed CAD models. This can significantly reduce the time and effort required for product design, allowing startups to focus on innovation and market entry.

  • Adopt Advanced Data Annotation Techniques: The novel data annotation pipeline used by Text2CAD leverages both LLMs and VLMs to generate multi-level text prompts. Startups can adopt similar techniques to enhance their data processing capabilities, leading to more accurate and efficient outcomes.

  • Utilize Transformer Architectures: The Text2CAD Transformer showcases the power of Transformer-based architectures in generating complex outputs from simple inputs. Startups can explore Transformer models to automate and improve various aspects of their operations, from customer service to product development.


Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion. For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverEdit PDFs Securely & Freely: Breeze PDF In-Browser SolutionBreeze PDF is a free, offline browser-based PDF editor ensuring privacy. It offers text, image, and signature additions, form fields, merging, page deletion, and password protection without uploads.
BY Mark Howell 2 days ago
CoverDecoding R1: The Future of AI Reasoning ModelsR1 is an affordable, open-source AI model emphasizing reasoning, enabling innovation and efficiency, while influencing AI advancements and geopolitical dynamics.
BY Mark Howell 26 January 2025
CoverSteam Brick: A Minimalist Gaming Console Redefines PortabilitySteam Brick: A modified, screenless Steam Deck for travel, focusing on portability by using external displays and inputs. A creative yet impractical DIY project with potential risks.
BY Mark Howell 26 January 2025
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 13 November 2024
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 13 November 2024
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 13 November 2024
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 13 November 2024
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now