Class SeleniumDocumentLoader

java.lang.Object
dev.langchain4j.data.document.loader.selenium.SeleniumDocumentLoader
All Implemented Interfaces:
AutoCloseable

public class SeleniumDocumentLoader extends Object implements AutoCloseable
Utility class for loading web documents using Selenium. Returns a Document object containing the content of the Web page.
  • Method Details

    • pageReadyCondition

      public SeleniumDocumentLoader pageReadyCondition(org.openqa.selenium.support.ui.ExpectedCondition<Boolean> condition)
      Set a custom page ready condition for waiting until the page is loaded.
      Parameters:
      condition - the ExpectedCondition to use
      Returns:
      this loader instance
    • load

      public Document load(String url, DocumentParser documentParser)
      Loads a document from the specified URL and parses its content using the given DocumentParser.

      This method uses the configured WebDriver to navigate to the provided URL, and retrieves the raw page content. The content is then passed to the provided DocumentParser for parsing, and the resulting Document is returned with the URL added to its metadata.

      Parameters:
      url - The URL of the web page to load. Must not be null.
      documentParser - The parser used to extract structured text from the loaded page content. Must not be null.
      Returns:
      A Document containing parsed content and the source URL as metadata.
      Throws:
      NullPointerException - if the documentParser is null.
      RuntimeException - if an error occurs while loading or retrieving the content from the URL.
    • load

      public Document load(String url)
      Loads a document from the specified URL and wraps the raw page source as a Document.

      This method fetches the content of the given URL using the configured WebDriver, waits until the page is fully loaded and returns a Document containing the raw HTML or text content along with the source URL as metadata.

      Parameters:
      url - The URL to load the document from. Must not be null.
      Returns:
      A Document containing the raw page source and the URL as metadata.
      Throws:
      RuntimeException - if the page fails to load or an error occurs during retrieval.
    • pageContent

      public String pageContent(String url)
      Retrieves the full page source of the given URL using Selenium.

      This method navigates the WebDriver to the specified URL, waits for the page to be fully loaded, and then returns the page content as a string.

      Parameters:
      url - The URL to load. Must not be null.
      Returns:
      The full HTML or text content of the loaded page.
      Throws:
      RuntimeException - if an error occurs while loading the page or retrieving the content.
    • close

      public void close()
      Closes the underlying WebDriver instance.
      Specified by:
      close in interface AutoCloseable
    • builder

      public static SeleniumDocumentLoader.Builder builder()