Test pages for validating page content extraction (PCE) and form context capture. Each page presents a different HTML structure to test how Readability, Turndown, and our noise filters handle real-world patterns.
Clean article with headings, paragraphs, lists, and images. No forms, no noise. Baseline for content extraction.
Open TestArticle content with a single contact form embedded in the middle. Tests form context extraction alongside page content.
Open TestPage with three different forms: newsletter signup (top), contact form (middle), demo request (bottom). Tests multi-form extraction with different contexts.
Open TestPricing tiers with a "Get a Demo" form. Tests extraction of structured pricing content and form in commercial context.
Open TestLong-form blog article with author byline, publish date, code blocks, blockquotes, and a comments section. Tests content extraction quality on editorial content.
Open TestLanding page with sticky header, notification bar, cookie banner, chat widget mock, newsletter popup, social sharing, and floating CTA. Tests noise filtering.
Open TestE-commerce product detail page with JSON-LD structured data, image gallery, specs table, reviews, and "Notify Me" form. Tests structured data extraction.
Open TestFAQ accordion with expandable answers and a support ticket form. Tests extraction of Q&A content and form context in a support setting.
Open TestAlmost empty page with just a title and one sentence. Tests how extraction handles pages with very little content.
Open TestPage dominated by forms — login, registration, search, feedback, and newsletter. Minimal text content. Tests form extraction when forms ARE the content.
Open TestOne visible form and one hidden (display:none) form. Tests whether the extraction correctly handles visibility.
Open TestContent loaded dynamically via JavaScript after page load. Tests whether Readability captures dynamically rendered content.
Open TestBlog index with 8-10 article cards, sidebar, and no single dominant article. Tests Readability picking only one article and querySelector('article') getting only the first.
Open TestProduct page with 4 competing JSON-LD blocks: Organization, BreadcrumbList, Product, and FAQPage. Tests querySelector grabbing only the first block.
Open TestCheckout form where fields are scattered outside the <form> tag using the form= attribute. Tests extractFormContext missing associated fields.
Open TestEmail-style content in nested layout tables mixed with one actual data table. Tests Turndown converting layout tables to garbled pipe-delimited markdown.
Open TestReact/Vue style page where form and reviews are rendered by JavaScript after a delay. Tests extraction firing before content exists in DOM.
Open TestModern signup form with NO <label> elements. Fields use only aria-label, aria-labelledby, and placeholder. Tests extractFieldSummary missing aria attributes.
Open Test4-step wizard with ~20 fields in DOM but only 4-6 visible at a time. Hidden steps use display:none. Tests visibility check on form vs individual fields.
Open TestTabbed content with 3 tabs and 6-item accordion, all collapsed. Substantial hidden content plus a form inside an accordion panel.
Open TestGutenberg block patterns: wp-block-cover, gallery, columns, pullquote, and empty wp-block-latest-posts. Tests comment artifacts and empty dynamic blocks.
Open TestInfographic with large inline SVGs containing <text> elements with data labels. Tests SVG text leaking into markdown and performance with many path elements.
Open TestFull-screen consent overlay with non-standard class names (sp_message_container, qc-cmp2-container). Tests unrecognized CMP classes polluting output.
Open TestRich text compose interface using contenteditable div, custom dropdown selects, and submit button outside any <form> tag. Invisible to form extraction.
Open TestProduct reviews with heavy emoji, Arabic RTL text, CJK characters, math symbols, and ZWJ emoji sequences. Tests truncation and sentence boundary detection.
Open TestProduct gallery with mixed image types: good alt, empty alt, missing alt, lazy-loaded, and <picture> element. Plus CSS background-image hero.
Open TestCSS-only hero slider with 4 slides (only 1 visible) and a product card carousel. Tests multiple hidden slides and image-only slides producing empty content.
Open TestYouTube iframe, Vimeo iframe, native <video>, and VideoObject JSON-LD. Tests iframe stripping losing video context while preserving surrounding text.
Open TestPage where all visuals use CSS background-image: hero banner, feature icons, testimonial section, product thumbs. Tests visual content invisible to extraction.
Open Test