RagUser
RAG user agent.
It extends a user agent and has RAG related parameters (retrieve_config
).
WaldiezRagUser
¶
Bases: WaldiezAgent
RAG user agent.
It extends a user agent and has RAG related parameters.
Attributes:
Name | Type | Description |
---|---|---|
agent_type | Literal['rag_user'] | The agent type: 'rag_user' for a RAG user agent. |
data | WaldiezRagUserData | The RAG user agent's data. See |
retrieve_config | WaldiezRagUserRetrieveConfig | The RAG user agent's retrieve config. |
retrieve_config: WaldiezRagUserRetrieveConfig
property
¶
Get the retrieve config.
Returns:
Type | Description |
---|---|
WaldiezRagUserRetrieveConfig | The RAG user agent's retrieve config. |
Waldiez RAG user agent data.
WaldiezRagUserData
¶
Bases: WaldiezUserProxyData
RAG user agent data.
The data for a RAG user agent.
Attributes:
Name | Type | Description |
---|---|---|
use_message_generator | bool | Whether to use the message generator in user's chats. Defaults to False. |
retrieve_config | WaldiezRagUserRetrieveConfig | The RAG user agent's retrieve config. |
RAG user agent retrieve config.
WaldiezRagUserChunkMode = Literal['multi_lines', 'one_line']
module-attribute
¶
Possible chunk modes for the retrieve chat.
WaldiezRagUserRetrieveConfig
¶
Bases: WaldiezBase
RAG user agent.
Attributes:
Name | Type | Description |
---|---|---|
task | Literal['code', 'qa', 'default'] | The task of the retrieve chat. Possible values are 'code', 'qa' and 'default'. System prompt will be different for different tasks. The default value is default, which supports both code and qa, and provides source information in the end of the response. |
vector_db | Literal['chroma', 'pgvector', 'mongodb', 'qdrant'] | The vector db for the retrieve chat. |
db_config | Annotated[WaldiezVectorDbConfig, Field] | The config for the selected vector db. |
docs_path | Optional[Union[str, List[str]]] | The path to the docs directory. It can also be the path to a single file, the url to a single file or a list of directories, files and urls. Default is None, which works only if the collection is already created. |
new_docs | bool | When True, only adds new documents to the collection; when False, updates existing documents and adds new ones. Default is True. Document id is used to determine if a document is new or existing. By default, the id is the hash value of the content. |
model | Optional[str] | The model to use for the retrieve chat. If key not provided, a default model gpt-4 will be used. |
chunk_token_size | Optional[int] | The chunk token size for the retrieve chat. If key not provided, a default size max_tokens * 0.4 will be used. |
context_max_tokens | Optional[int] | The context max token size for the retrieve chat. If key not provided, a default size max_tokens * 0.8 will be used. |
chunk_mode | Optional[str] | The chunk mode for the retrieve chat. Possible values are 'multi_lines' and 'one_line'. If key not provided, a default mode multi_lines will be used. |
must_break_at_empty_line | bool | Chunk will only break at empty line if True. Default is True. If chunk_mode is 'one_line', this parameter will be ignored. |
use_custom_embedding | bool | Whether to use custom embedding for the retrieve chat. Default is False. If True, the embedding_function should be provided. |
embedding_function | Optional[str] | The embedding function for creating the vector db. Default is None, SentenceTransformer with the given embedding_model will be used. If you want to use OpenAI, Cohere, HuggingFace or other embedding functions, you can pass it here, follow the examples in https://docs.trychroma.com/guides/embeddings. |
customized_prompt | Optional[str] | The customized prompt for the retrieve chat. Default is None. |
customized_answer_prefix | Optional[str] | The customized answer prefix for the retrieve chat. Default is ''. If not '' and the customized_answer_prefix is not in the answer, Update Context will be triggered. |
update_context | bool | If False, will not apply Update Context for interactive retrieval. Default is True. |
collection_name | Optional[str] | The name of the collection. If key not provided, a default name autogen-docs will be used. |
get_or_create | bool | Whether to get the collection if it exists. Default is False. |
overwrite | bool | Whether to overwrite the collection if it exists. Default is False. Case 1. if the collection does not exist, create the collection. Case 2. the collection exists, if overwrite is True, it will overwrite the collection. Case 3. the collection exists and overwrite is False, if get_or_create is True, it will get the collection, otherwise it raise a ValueError. |
use_custom_token_count | bool | Whether to use custom token count function for the retrieve chat. Default is False. If True, the custom_token_count_function should be provided. |
custom_token_count_function | Optional[str] | A custom function to count the number of tokens in a string. The function should take (text:str, model:str) as input and return the token_count(int). the retrieve_config['model'] will be passed in the function. Default is autogen.token_count_utils.count_token that uses tiktoken, which may not be accurate for non-OpenAI models. |
use_custom_text_split | bool | Whether to use custom text split function for the retrieve chat. Default is False. If True, the custom_text_split_function should be provided. |
custom_text_split_function | Optional[str] | A custom function to split a string into a list of strings. Default is None, will use the default function in autogen.retrieve_utils. split_text_to_chunks. |
custom_text_types | Optional[List[str]] | A list of file types to be processed. Default is autogen.retrieve_utils. TEXT_FORMATS. This only applies to files under the directories in docs_path. Explicitly included files and urls will be chunked regardless of their types. |
recursive | bool | Whether to search documents recursively in the docs_path. Default is True. |
distance_threshold | float | The threshold for the distance score, only distance smaller than it will be returned. Will be ignored if < 0. Default is -1. |
embedding_function_string | Optional[str] | The embedding function string (if use_custom_embedding is True). |
token_count_function_string | Optional[str] | The token count function string (if use_custom_token_count is True). |
text_split_function_string | Optional[str] | The text split function string (if use_custom_text_split is True). |
n_results | Optional[int] | The number of results to return. Default is None, which will return all |
Methods:
Name | Description |
---|---|
validate_custom_embedding_function | Validate the custom embedding function. |
validate_custom_token_count_function | Validate the custom token count function. |
validate_custom_text_split_function | Validate the custom text split function. |
validate_rag_user_data | Validate the RAG user data. |
embedding_function_string: Optional[str]
property
¶
get_custom_embedding_function(name_prefix: Optional[str] = None, name_suffix: Optional[str] = None) -> Tuple[str, str]
¶
Generate the custom embedding function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name_prefix | str | The function name prefix. | None |
name_suffix | str | The function name suffix. | None |
Returns:
Type | Description |
---|---|
Tuple[str, str] | The custom embedding function and the function name. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
get_custom_text_split_function(name_prefix: Optional[str] = None, name_suffix: Optional[str] = None) -> Tuple[str, str]
¶
Generate the custom text split function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name_prefix | str | The function name prefix. | None |
name_suffix | str | The function name suffix. | None |
Returns:
Type | Description |
---|---|
Tuple[str, str] | The custom text split function and the function name. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
get_custom_token_count_function(name_prefix: Optional[str] = None, name_suffix: Optional[str] = None) -> Tuple[str, str]
¶
Generate the custom token count function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name_prefix | str | The function name prefix. | None |
name_suffix | str | The function name suffix. | None |
Returns:
Type | Description |
---|---|
Tuple[str, str] | The custom token count function and the function name. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
text_split_function_string: Optional[str]
property
¶
token_count_function_string: Optional[str]
property
¶
validate_custom_embedding_function() -> None
¶
Validate the custom embedding function.
Raises:
Type | Description |
---|---|
ValueError | If the validation fails. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
validate_custom_text_split_function() -> None
¶
Validate the custom text split function.
Raises:
Type | Description |
---|---|
ValueError | If the validation fails. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
validate_custom_token_count_function() -> None
¶
Validate the custom token count function.
Raises:
Type | Description |
---|---|
ValueError | If the validation fails. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
validate_docs_path() -> None
¶
Validate the docs path.
Raises:
Type | Description |
---|---|
ValueError | If the validation fails. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
validate_rag_user_data() -> Self
¶
Validate the RAG user data.
Raises:
Type | Description |
---|---|
ValueError | If the validation fails. |
Returns:
Type | Description |
---|---|
WaldiezRagUserData | The validated RAG user data. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
WaldiezRagUserTask = Literal['code', 'qa', 'default']
module-attribute
¶
Possible tasks for the retrieve chat.
WaldiezRagUserVectorDb = Literal['chroma', 'pgvector', 'mongodb', 'qdrant']
module-attribute
¶
Possible vector dbs for the retrieve chat.
is_remote_path(path: str) -> Tuple[bool, bool]
¶
Check if a path is a remote path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | The path to check. | required |
Returns:
Type | Description |
---|---|
Tuple[bool, bool] | If the path is a remote path and if it's a raw string. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
remove_file_scheme(path: str) -> str
¶
Remove the file:// scheme from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | The path to remove the scheme from. | required |
Returns:
Type | Description |
---|---|
str | The path without the scheme. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
resolve_path(path: str, is_raw: bool, must_exist: bool) -> str
¶
Try to resolve a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | The path to resolve. | required |
is_raw | bool | If the path is a raw string. | required |
must_exist | bool | If the path must exist. | required |
Returns:
Type | Description |
---|---|
Path | The resolved path. |
Raises:
Type | Description |
---|---|
ValueError | If the path is not a valid local path. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
string_represents_folder(path: str) -> bool
¶
Check if a string represents a folder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | The string to check (does not need to exist). | required |
Returns:
Type | Description |
---|---|
bool | True if the path is likely a folder, False if it's likely a file. |
Source code in waldiez/models/agents/rag_user/retrieve_config.py
The vector db config for the RAG user agent.
WaldiezRagUserVectorDbConfig
¶
Bases: WaldiezBase
The config for the vector db.
Attributes:
Name | Type | Description |
---|---|---|
model | str | The model to use for the vector db embeddings. |
use_memory | bool | Whether to use memory for the vector db (if |
use_local_storage | bool | Whether to use local storage for the db (if |
local_storage_path | Optional[str] | The path to the local storage for the vector db (if |
connection_url | Optional[str] | The connection url for the vector db. |
wait_until_index_ready | Optional[float] | Blocking call to wait until the database indexes are ready (if |
wait_until_document_ready | Optional[float] | Blocking call to wait until the database documents are ready (if |
metadata | Optional[Dict[str, Any]] | The metadata to use for the vector db. Example: {"hnsw:space": "ip", "hnsw:construction_ef": 30, "hnsw:M": 32} |
Methods:
Name | Description |
---|---|
validate_vector_db_config | Validate the vector db config. |
validate_vector_db_config() -> Self
¶
Validate the vector db config.
if local storage is used, make sure the path is provided, and make it absolute if not already.
Returns:
Type | Description |
---|---|
WaldiezRagUserVectorDbConfig | The vector db config. |
Raises:
Type | Description |
---|---|
ValueError | If the validation fails. |