Главное

D7z Menu V2 Link Instant

We evaluate D7Z-Menu V2 on three datasets:

  • Server-side storage:
  • URL format:
  • Security:
  • Frontend:
  • Early works utilized CNNs for layout analysis, while recent transformer-based models like LayoutLM and Donut utilize encoder-decoder structures to map pixels to text sequences. d7z menu v2 link

    The core innovation of V2 lies in the decoding phase. Let $X$ be the image embedding and $Y = y_1, y_2, ..., y_T$ be the target token sequence (JSON string). We evaluate D7Z-Menu V2 on three datasets:

    In standard VLMs, the probability $P(Y|X)$ is modeled autoregressively. In D7Z-Menu V2, we introduce a Refinement Gate $G$ at each step $t$: $$ P(y_t | y_<t, X) = \textSoftmax(W \cdot h_t + \lambda \cdot G(h_t)) $$ Where $G(h_t)$ calculates the likelihood of the current token adhering to a pre-defined "Menu Schema" (e.g., ensuring a price token follows a dish name token). If the model attempts to generate a structural closing bracket } prematurely or hallucinates a non-existent field, the gate dampens the probability distribution, forcing the decoder to "refine" its choice in real-time. Server-side storage:

    Abstract The digitization of menu images remains a critical challenge in Document Intelligence, primarily due to the complex spatial layouts, diverse typography, and implicit semantic hierarchies (e.g., dishes nested under sections with pricing attributes). Existing Vision-Language Models (VLMs) often struggle with "hallucination" in zero-shot settings or fail to preserve the exact spatial hierarchies required for automated ordering systems. This paper introduces D7Z-Menu V2, a novel framework that utilizes a Decoder-Driven Zero-Refinement mechanism. Unlike traditional OCR-pipeline approaches, D7Z-Menu V2 treats menu parsing as a conditional generation task constrained by a structural grammar schema. We demonstrate that by shifting the refinement burden entirely to the decoder phase—without external retrieval augmentation—our model achieves state-of-the-art performance on the MenuOCR benchmark, significantly reducing structural errors while maintaining semantic integrity.


    более 150 000 подписчиков

    Ничего лишнего. Рассказываем, как заработать на интеллектуальной собственности, идеях и технологиях

    Спасибо за подписку! Первое письмо уже идет к вам.