Class PlaywrightDocumentLoader
java.lang.Object
dev.langchain4j.data.document.loader.playwright.PlaywrightDocumentLoader
- All Implemented Interfaces:
AutoCloseable
Utility class for loading web documents using Playwright.
Returns a
Document object containing the content of the Web page.-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionbuilder()voidclose()Closes the underlying Browser instance.Loads aDocumentfrom the specified URL by fetching its HTML content.load(String url, DocumentParser documentParser) Loads aDocumentfrom the specified URL by fetching its HTML content and parsing it using the providedDocumentParser.pageContent(String url) Loads the HTML content of a web page from the specified URL using Playwright.
-
Method Details
-
load
Loads aDocumentfrom the specified URL by fetching its HTML content.This method fetches the page content using
pageContent(url), and then wraps it into aDocumentwith metadata indicating the source URL.- Parameters:
url- the URL of the web page to load; must not benull- Returns:
- a
Documentinstance containing the HTML content and metadata - Throws:
NullPointerException- if the providedurlisnullRuntimeException- if the document fails to load due to an underlying content fetch issue
-
load
Loads aDocumentfrom the specified URL by fetching its HTML content and parsing it using the providedDocumentParser.This method delegates content parsing to the given parser and adds the source URL to the document's metadata.
- Parameters:
url- the URL of the web page to load; must not benulldocumentParser- the parser used to convert raw HTML content into aDocument; must not benull- Returns:
- a
Documentparsed from the web page content, with metadata including the URL - Throws:
NullPointerException- ifurlordocumentParserisnullRuntimeException- if the document content cannot be loaded or parsed
-
pageContent
Loads the HTML content of a web page from the specified URL using Playwright.This method opens a new page in the browser, navigates to the given URL, waits for the DOM content to be fully loaded, and returns the full HTML content of the page.
- Parameters:
url- the URL of the web page to load; must not benull- Returns:
- the full HTML content of the page as a
String - Throws:
NullPointerException- if the providedurlisnullRuntimeException- if the page fails to load or an unexpected error occurs during navigation
-
close
public void close()Closes the underlying Browser instance.- Specified by:
closein interfaceAutoCloseable
-
builder
-