java.lang.Object

dev.langchain4j.data.document.loader.selenium.SeleniumDocumentLoader

All Implemented Interfaces:: AutoCloseable

public class SeleniumDocumentLoader extends Object implements AutoCloseable

Utility class for loading web documents using Selenium. Returns a Document object containing the content of the Web page.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

SeleniumDocumentLoader.Builder
Method Summary

Modifier and Type

Method

Description

static SeleniumDocumentLoader.Builder

builder()

void

close()

Closes the underlying WebDriver instance.

Document

load(String url)

Loads a document from the specified URL and wraps the raw page source as a Document.

Document

load(String url, DocumentParser documentParser)

Loads a document from the specified URL and parses its content using the given DocumentParser.

String

pageContent(String url)

Retrieves the full page source of the given URL using Selenium.

SeleniumDocumentLoader

pageReadyCondition(org.openqa.selenium.support.ui.ExpectedCondition<Boolean> condition)

Set a custom page ready condition for waiting until the page is loaded.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- pageReadyCondition
  
  public SeleniumDocumentLoader pageReadyCondition(org.openqa.selenium.support.ui.ExpectedCondition<Boolean> condition)
  
  Set a custom page ready condition for waiting until the page is loaded.
  
  Parameters:
  
  condition - the ExpectedCondition to use
  
  Returns:
  
  this loader instance
- load
  
  public Document load(String url, DocumentParser documentParser)
  
  Loads a document from the specified URL and parses its content using the given DocumentParser.
  This method uses the configured WebDriver to navigate to the provided URL, and retrieves the raw page content. The content is then passed to the provided DocumentParser for parsing, and the resulting Document is returned with the URL added to its metadata.
  
  Parameters:
  
  url - The URL of the web page to load. Must not be null.
  
  documentParser - The parser used to extract structured text from the loaded page content. Must not be null.
  
  Returns:
  
  A Document containing parsed content and the source URL as metadata.
  
  Throws:
  
  NullPointerException - if the documentParser is null.
  
  RuntimeException - if an error occurs while loading or retrieving the content from the URL.
- load
  
  public Document load(String url)
  
  Loads a document from the specified URL and wraps the raw page source as a Document.
  This method fetches the content of the given URL using the configured WebDriver, waits until the page is fully loaded and returns a Document containing the raw HTML or text content along with the source URL as metadata.
  
  Parameters:
  
  url - The URL to load the document from. Must not be null.
  
  Returns:
  
  A Document containing the raw page source and the URL as metadata.
  
  Throws:
  
  RuntimeException - if the page fails to load or an error occurs during retrieval.
- pageContent
  
  public String pageContent(String url)
  
  Retrieves the full page source of the given URL using Selenium.
  This method navigates the WebDriver to the specified URL, waits for the page to be fully loaded, and then returns the page content as a string.
  
  Parameters:
  
  url - The URL to load. Must not be null.
  
  Returns:
  
  The full HTML or text content of the loaded page.
  
  Throws:
  
  RuntimeException - if an error occurs while loading the page or retrieving the content.
- close
  
  public void close()
  
  Closes the underlying WebDriver instance.
  
  Specified by:
  
  close in interface AutoCloseable
- builder
  
  public static SeleniumDocumentLoader.Builder builder()

Class SeleniumDocumentLoader

Nested Class Summary

Method Summary

Methods inherited from class Object

Method Details

pageReadyCondition

load

load

pageContent

close

builder