Interface Document

All Known Implementing Classes:
DefaultDocument

public interface Document
Represents an unstructured piece of text that usually corresponds to a content of a single file. This text could originate from various sources such as a text file, PDF, DOCX, or a web page (HTML). Each document may have associated Metadata including its source, owner, creation date, etc.
  • Field Details

    • FILE_NAME

      static final String FILE_NAME
      Common metadata key for the name of the file from which the document was loaded.
      See Also:
    • ABSOLUTE_DIRECTORY_PATH

      static final String ABSOLUTE_DIRECTORY_PATH
      Common metadata key for the absolute path of the directory from which the document was loaded.
      See Also:
    • URL

      static final String URL
      Common metadata key for the URL from which the document was loaded.
      See Also:
  • Method Details

    • text

      String text()
      Returns the text of this document.
      Returns:
      the text.
    • metadata

      Metadata metadata()
      Returns the metadata associated with this document.
      Returns:
      the metadata.
    • metadata

      @Deprecated(forRemoval=true) default String metadata(String key)
      Deprecated, for removal: This API element is subject to removal in a future version.
      Looks up the metadata value for the given key.
      Parameters:
      key - the key to look up.
      Returns:
      the metadata value for the given key, or null if the key is not present.
    • toTextSegment

      default TextSegment toTextSegment()
      Builds a TextSegment from this document.
      Returns:
      a TextSegment
    • from

      static Document from(String text)
      Creates a new Document from the given text.

      The created document will have empty metadata.

      Parameters:
      text - the text of the document.
      Returns:
      a new Document.
    • from

      static Document from(String text, Metadata metadata)
      Creates a new Document from the given text.
      Parameters:
      text - the text of the document.
      metadata - the metadata of the document.
      Returns:
      a new Document.
    • document

      static Document document(String text)
      Creates a new Document from the given text.

      The created document will have empty metadata.

      Parameters:
      text - the text of the document.
      Returns:
      a new Document.
    • document

      static Document document(String text, Metadata metadata)
      Creates a new Document from the given text.
      Parameters:
      text - the text of the document.
      metadata - the metadata of the document.
      Returns:
      a new Document.