Class PlaywrightDocumentLoader
java.lang.Object
dev.langchain4j.data.document.loader.playwright.PlaywrightDocumentLoader
- All Implemented Interfaces:
AutoCloseable
Utility class for loading web documents using Playwright.
Returns a
Document
object containing the content of the Web page.-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionbuilder()
void
close()
Closes the underlying Browser instance.Loads aDocument
from the specified URL by fetching its HTML content.load
(String url, DocumentParser documentParser) Loads aDocument
from the specified URL by fetching its HTML content and parsing it using the providedDocumentParser
.pageContent
(String url) Loads the HTML content of a web page from the specified URL using Playwright.
-
Method Details
-
load
Loads aDocument
from the specified URL by fetching its HTML content.This method fetches the page content using
pageContent(url)
, and then wraps it into aDocument
with metadata indicating the source URL.- Parameters:
url
- the URL of the web page to load; must not benull
- Returns:
- a
Document
instance containing the HTML content and metadata - Throws:
NullPointerException
- if the providedurl
isnull
RuntimeException
- if the document fails to load due to an underlying content fetch issue
-
load
Loads aDocument
from the specified URL by fetching its HTML content and parsing it using the providedDocumentParser
.This method delegates content parsing to the given parser and adds the source URL to the document's metadata.
- Parameters:
url
- the URL of the web page to load; must not benull
documentParser
- the parser used to convert raw HTML content into aDocument
; must not benull
- Returns:
- a
Document
parsed from the web page content, with metadata including the URL - Throws:
NullPointerException
- ifurl
ordocumentParser
isnull
RuntimeException
- if the document content cannot be loaded or parsed
-
pageContent
Loads the HTML content of a web page from the specified URL using Playwright.This method opens a new page in the browser, navigates to the given URL, waits for the DOM content to be fully loaded, and returns the full HTML content of the page.
- Parameters:
url
- the URL of the web page to load; must not benull
- Returns:
- the full HTML content of the page as a
String
- Throws:
NullPointerException
- if the providedurl
isnull
RuntimeException
- if the page fails to load or an unexpected error occurs during navigation
-
close
public void close()Closes the underlying Browser instance.- Specified by:
close
in interfaceAutoCloseable
-
builder
-