Interface Document

All Known Implementing Classes:
DefaultDocument

public interface Document
Represents an unstructured piece of text that usually corresponds to a content of a single file. This text could originate from various sources such as a text file, PDF, DOCX, or a web page (HTML). Each document may have associated Metadata including its source, owner, creation date, etc.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Common metadata key for the absolute path of the directory from which the document was loaded.
    static final String
    Common metadata key for the name of the file from which the document was loaded.
    static final String
    Common metadata key for the URL from which the document was loaded.
  • Method Summary

    Modifier and Type
    Method
    Description
    static Document
    Creates a new Document from the given text.
    static Document
    document(String text, Metadata metadata)
    Creates a new Document from the given text.
    static Document
    from(String text)
    Creates a new Document from the given text.
    static Document
    from(String text, Metadata metadata)
    Creates a new Document from the given text.
    Returns the metadata associated with this document.
    Returns the text of this document.
    default TextSegment
    Builds a TextSegment from this document.
  • Field Details

    • FILE_NAME

      static final String FILE_NAME
      Common metadata key for the name of the file from which the document was loaded.
      See Also:
    • ABSOLUTE_DIRECTORY_PATH

      static final String ABSOLUTE_DIRECTORY_PATH
      Common metadata key for the absolute path of the directory from which the document was loaded.
      See Also:
    • URL

      static final String URL
      Common metadata key for the URL from which the document was loaded.
      See Also:
  • Method Details

    • text

      String text()
      Returns the text of this document.
      Returns:
      the text.
    • metadata

      Metadata metadata()
      Returns the metadata associated with this document.
      Returns:
      the metadata.
    • toTextSegment

      default TextSegment toTextSegment()
      Builds a TextSegment from this document.
      Returns:
      a TextSegment
    • from

      static Document from(String text)
      Creates a new Document from the given text.

      The created document will have empty metadata.

      Parameters:
      text - the text of the document.
      Returns:
      a new Document.
    • from

      static Document from(String text, Metadata metadata)
      Creates a new Document from the given text.
      Parameters:
      text - the text of the document.
      metadata - the metadata of the document.
      Returns:
      a new Document.
    • document

      static Document document(String text)
      Creates a new Document from the given text.

      The created document will have empty metadata.

      Parameters:
      text - the text of the document.
      Returns:
      a new Document.
    • document

      static Document document(String text, Metadata metadata)
      Creates a new Document from the given text.
      Parameters:
      text - the text of the document.
      metadata - the metadata of the document.
      Returns:
      a new Document.