Class SeleniumDocumentLoader
java.lang.Object
dev.langchain4j.data.document.loader.selenium.SeleniumDocumentLoader
- All Implemented Interfaces:
AutoCloseable
Utility class for loading web documents using Selenium.
Returns a
Document
object containing the content of the Web page.-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionbuilder()
void
close()
Closes the underlying WebDriver instance.Loads a document from the specified URL and wraps the raw page source as aDocument
.load
(String url, DocumentParser documentParser) Loads a document from the specified URL and parses its content using the givenDocumentParser
.pageContent
(String url) Retrieves the full page source of the given URL using Selenium.pageReadyCondition
(org.openqa.selenium.support.ui.ExpectedCondition<Boolean> condition) Set a custom page ready condition for waiting until the page is loaded.
-
Method Details
-
pageReadyCondition
public SeleniumDocumentLoader pageReadyCondition(org.openqa.selenium.support.ui.ExpectedCondition<Boolean> condition) Set a custom page ready condition for waiting until the page is loaded.- Parameters:
condition
- the ExpectedCondition to use- Returns:
- this loader instance
-
load
Loads a document from the specified URL and parses its content using the givenDocumentParser
.This method uses the configured
WebDriver
to navigate to the provided URL, and retrieves the raw page content. The content is then passed to the providedDocumentParser
for parsing, and the resultingDocument
is returned with the URL added to its metadata.- Parameters:
url
- The URL of the web page to load. Must not be null.documentParser
- The parser used to extract structured text from the loaded page content. Must not be null.- Returns:
- A
Document
containing parsed content and the source URL as metadata. - Throws:
NullPointerException
- if thedocumentParser
is null.RuntimeException
- if an error occurs while loading or retrieving the content from the URL.
-
load
Loads a document from the specified URL and wraps the raw page source as aDocument
.This method fetches the content of the given URL using the configured
WebDriver
, waits until the page is fully loaded and returns aDocument
containing the raw HTML or text content along with the source URL as metadata.- Parameters:
url
- The URL to load the document from. Must not be null.- Returns:
- A
Document
containing the raw page source and the URL as metadata. - Throws:
RuntimeException
- if the page fails to load or an error occurs during retrieval.
-
pageContent
Retrieves the full page source of the given URL using Selenium.This method navigates the
WebDriver
to the specified URL, waits for the page to be fully loaded, and then returns the page content as a string.- Parameters:
url
- The URL to load. Must not be null.- Returns:
- The full HTML or text content of the loaded page.
- Throws:
RuntimeException
- if an error occurs while loading the page or retrieving the content.
-
close
public void close()Closes the underlying WebDriver instance.- Specified by:
close
in interfaceAutoCloseable
-
builder
-