Class PlaywrightDocumentLoader

java.lang.Object
dev.langchain4j.data.document.loader.playwright.PlaywrightDocumentLoader
All Implemented Interfaces:
AutoCloseable

public class PlaywrightDocumentLoader extends Object implements AutoCloseable
Utility class for loading web documents using Playwright. Returns a Document object containing the content of the Web page.
  • Method Details

    • load

      public Document load(String url)
      Loads a Document from the specified URL by fetching its HTML content.

      This method fetches the page content using pageContent(url), and then wraps it into a Document with metadata indicating the source URL.

      Parameters:
      url - the URL of the web page to load; must not be null
      Returns:
      a Document instance containing the HTML content and metadata
      Throws:
      NullPointerException - if the provided url is null
      RuntimeException - if the document fails to load due to an underlying content fetch issue
    • load

      public Document load(String url, DocumentParser documentParser)
      Loads a Document from the specified URL by fetching its HTML content and parsing it using the provided DocumentParser.

      This method delegates content parsing to the given parser and adds the source URL to the document's metadata.

      Parameters:
      url - the URL of the web page to load; must not be null
      documentParser - the parser used to convert raw HTML content into a Document; must not be null
      Returns:
      a Document parsed from the web page content, with metadata including the URL
      Throws:
      NullPointerException - if url or documentParser is null
      RuntimeException - if the document content cannot be loaded or parsed
    • pageContent

      public String pageContent(String url)
      Loads the HTML content of a web page from the specified URL using Playwright.

      This method opens a new page in the browser, navigates to the given URL, waits for the DOM content to be fully loaded, and returns the full HTML content of the page.

      Parameters:
      url - the URL of the web page to load; must not be null
      Returns:
      the full HTML content of the page as a String
      Throws:
      NullPointerException - if the provided url is null
      RuntimeException - if the page fails to load or an unexpected error occurs during navigation
    • close

      public void close()
      Closes the underlying Browser instance.
      Specified by:
      close in interface AutoCloseable
    • builder

      public static PlaywrightDocumentLoader.Builder builder()