dev.langchain4j.model.gpullama3.GPULlama3StreamingChatModel

All Implemented Interfaces:: StreamingChatModel, AutoCloseable

public class GPULlama3StreamingChatModel extends Object implements StreamingChatModel

GPULlama3 implementation of the langchain4j StreamingChatModel interface.

This model provides synchronous chat capabilities using the GPULlama3.java library, supporting both CPU and GPU execution modes. The model automatically separates thinking content from actual responses.

Example usage:

GPULlama3StreamingChatModel model = GPULlama3StreamingChatModel.builder()
    .modelPath(Paths.get("path/to/model.gguf"))
    .temperature(0.7)
    .maxTokens(2048)
    .onGPU(true)
    .build();

ChatResponse response = model.chat(chatRequest);

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

GPULlama3StreamingChatModel.Builder
Method Summary

Modifier and Type

Method

Description

static GPULlama3StreamingChatModel.Builder

builder()

void

close()

Closes the model and releases all associated resources.

void

doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler)

void

freeTornadoVMGPUResources()

Manually releases GPU resources allocated by TornadoVM.

org.beehive.gpullama3.model.Model

getModel()

org.beehive.gpullama3.inference.sampler.Sampler

getSampler()

void

init(Path modelPath, Double temperature, Double topP, Integer seed, Integer maxTokens, Boolean onGPU)

String

modelResponse(ChatRequest request, IntConsumer tokenConsumer)

Generates a chat response from the model.

void

printLastMetrics()

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface StreamingChatModel
chat, chat, chat, defaultRequestParameters, listeners, provider, supportedCapabilities

Method Details
- builder
  
  public static GPULlama3StreamingChatModel.Builder builder()
- doChat
  
  public void doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler)
  
  Specified by:
  
  doChat in interface StreamingChatModel
- init
  
  public void init(Path modelPath, Double temperature, Double topP, Integer seed, Integer maxTokens, Boolean onGPU)
- getModel
  
  public org.beehive.gpullama3.model.Model getModel()
- getSampler
  
  public org.beehive.gpullama3.inference.sampler.Sampler getSampler()
- modelResponse
  
  public String modelResponse(ChatRequest request, IntConsumer tokenConsumer)
  
  Generates a chat response from the model. Used by GPULlama3StreamingChatModel.
  
  Parameters:
  
  request -
  
  tokenConsumer -
  
  Returns:
- printLastMetrics
  
  public void printLastMetrics()
- freeTornadoVMGPUResources
  
  public void freeTornadoVMGPUResources()
  
  Manually releases GPU resources allocated by TornadoVM.
  This method can be called explicitly to free resources immediately, or will be called automatically when the model is garbage collected. It's safe to call this method multiple times.
- close
  
  public void close()
  
  Closes the model and releases all associated resources.
  This method implements AutoCloseable, allowing the model to be used with try-with-resources statements for automatic resource management.
  
  Specified by:
  
  close in interface AutoCloseable

Class GPULlama3StreamingChatModel

Nested Class Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface StreamingChatModel

Method Details

builder

doChat

init

getModel

getSampler

modelResponse

printLastMetrics

freeTornadoVMGPUResources

close