Interface DocumentTransformer

All Known Implementing Classes:
HtmlTextExtractor

public interface DocumentTransformer
Defines the interface for transforming a Document. Implementations can perform a variety of tasks such as transforming, filtering, enriching, etc.
  • Method Details

    • transform

      Document transform(Document document)
      Transforms a provided document.
      Parameters:
      document - The document to be transformed.
      Returns:
      The transformed document, or null if the document should be filtered out.
    • transformAll

      default List<Document> transformAll(List<Document> documents)
      Transforms all the provided documents.
      Parameters:
      documents - A list of documents to be transformed.
      Returns:
      A list of transformed documents. The length of this list may be shorter or longer than the original list. Returns an empty list if all documents were filtered out.