Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatedly getting «Error in code parsing: The code blob is invalid.» #201

Open
rnckp opened this issue Jan 15, 2025 · 14 comments
Open

Repeatedly getting «Error in code parsing: The code blob is invalid.» #201

rnckp opened this issue Jan 15, 2025 · 14 comments

Comments

@rnckp
Copy link

rnckp commented Jan 15, 2025

I repeatedly get this error when using the Codeagent:

The code blob is invalid, because the regex pattern ```(?:py|python)?\n(.*?)\n``` was not found in code_blob ...

I have tried to emphasize proper code block formatting in the agent.run() argument, to no avail.

The error occurs while using GPT-4o and Sonnet 3.5.

How can I fix this or at least improve realiability?

Thanks for any help in this regard!

@NeuralNotwerk
Copy link

With OpenAI models, it may help to put the extraction pattern in the prompt so it knows what format will be used to extract its response. It's large enough and competent enough to respond well to this. FWIW, other models respond reasonably well to this in the prompt too.

I'd like to be able to use prompting and tool/code extraction on a model or model family basis. One set of prompts for GPT-4o, a different set of prompts and extracts for Llama 3.x family models, and still a different set for Phi4 and other models.

@rnckp
Copy link
Author

rnckp commented Jan 15, 2025

@NeuralNotwerk Thanks. I think I already tried exactly what you suggest and emphasized proper formatting in the agent.run() argument. This unfortunately did not improve the output. Also both GPT-4o and Sonnet yield these errors.

@laurentvv
Copy link

I have the same error when I use the Ollama models, here is the code:

from smolagents import CodeAgent, LiteLLMModel, DuckDuckGoSearchTool

model = LiteLLMModel(
    model_id="ollama_chat/qwen2.5-coder:32b",
    api_base="http://localhost:11434",
)

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
    additional_authorized_imports=["requests", "bs4"],
    add_base_tools=True,
)


def check_ip_reputation(ip_address):
    prompt = (
        f'''The prompt should check the reputation of an IP address ({ip_address}) and return "GOOD" or "BAD" depending on whether it's associated with malicious activity.  The target model is ChatGPT, and the objective is problem-solving. The output language is English.
        Check the reputation of the IP address {ip_address}. If it is associated with malware, phishing, spam, or other malicious activities, respond with "BAD". Otherwise, respond with "GOOD".
        '''
    )

    response = agent.run(prompt)

    if hasattr(
        response, "content"
    ):  
        response_text = response.content
    else:
        response_text = str(response)  

    if response_text.strip().upper() in ["GOOD", "BAD"]:
        return response_text.strip().upper()
    else:
        return "GOOD"


ip_address = "8.8.8.8"
reputation = check_ip_reputation(ip_address)
print(reputation)

Error log

LiteLLMModel - ollama_chat/qwen2.5-coder:32b 
Error in code parsing:
The code blob is invalid, because the regex pattern ```(?:py|python)?\n(.*?)\n``` was not found in code_blob="To determine the reputation of the IP address 134.122.69.66, I would typically consult a 
reputable online service that specializes in IP reputation analysis. However, as an AI model, I don't have real-time access to external databases or services.\n\nBased on my knowledge up until my last
update in October 2023, there is no specific information indicating that the IP address 134.122.69.66 is associated with malware, phishing, spam, or other malicious activities. It appears to be a 
regular IP address used for various legitimate purposes.\n\nTherefore, based on available data up to my last update, I would respond with:\n\nGOOD\n\nPlease note that the reputation of an IP address 
can change over time, and it's always recommended to check the latest information using a reliable IP reputation service if you need current and accurate data.". Make sure to include code with the 
correct pattern, for instance:
Thoughts: Your thoughts
Code:
```py
# Your python code here
```<end_code>
Make sure to provide correct code blobs.
[Step 0: Duration 319.19 seconds| Input tokens: 127 | Output tokens: 185]

With the default model, it works fine.

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=HfApiModel(),
    additional_authorized_imports=["requests", "bs4"],
    add_base_tools=True,

def check_ip_reputation(ip_address):
    prompt = (
        f'''The prompt should check the reputation of an IP address ({ip_address}) and return "GOOD" or "BAD" depending on whether it's associated with malicious activity.  The target model is ChatGPT, and the objective is problem-solving. The output language is English.
        Check the reputation of the IP address {ip_address}. If it is associated with malware, phishing, spam, or other malicious activities, respond with "BAD". Otherwise, respond with "GOOD".
        '''
    )

    response = agent.run(prompt)

    if hasattr(
        response, "content"
    ):  
        response_text = response.content
    else:
        response_text = str(response)  

    if response_text.strip().upper() in ["GOOD", "BAD"]:
        return response_text.strip().upper()
    else:
        return "GOOD"


ip_address = "8.8.8.8"
reputation = check_ip_reputation(ip_address)
print(reputation)
)

it works fine.

HfApiModel - Qwen/Qwen2.5-Coder-32B-Instruct 
 ─ Executing this code: 
  ip_address = "8.8.8.8"                                                                                                                                                                          
  search_results = web_search(query=f"IP address {ip_address} reputation")                                                                                                                              
  print(search_results)                                                                                                                                                                                 
 
Execution logs:
## Search Results

[Researching IP address reputation - 
techdocs.broadcom.com](https://techdocs.broadcom.com/us/en/symantec-security-software/email-security/messaging-gateway/10-9-1/enabling-reputation-based-filtering-features/researching-ip-address-reputa
tion.html)
page to research historical and current statistical information about a particular IPv4 address or IPv6 address. You can view the sender groups (if any) that currently include the IP address, add the 
IP address to your Local Good Sender IPs or Local Bad Sender IPs sender groups, or clear the current sender policy of the IP address.
.....

@NeuralNotwerk
Copy link

NeuralNotwerk commented Jan 16, 2025

This may have something to do with it. 1.1.0 worked, 1.1.1 to <1.3.0 were broken for code calling and possibly tool calling agents (depending on the back end). After 1.1.0 they switched from passing a tuple (tool name, parameters, id) to passing an huggingfacehubmessageoutputobject (or something of the like) that doesn't automagically get mapped. When they pushed the changes it broke everything except huggingface transformers and huggingface api. It wasn't fantastic from my point of view.

Update/fix, merged in 1.3.0: #160

@bertilmuth
Copy link

I'm still getting the error in 1.3.0, but the agent continues to run the steps.

@jarrettbranch
Copy link

jarrettbranch commented Jan 19, 2025

With the OpenAIModel, I was able to work around this by passing max_tokens to it (specifically 4096). With TransformersModel I used max_new_tokens.
Ex:

model = TransformersModel(model_id="codellama/CodeLlama-34b-Instruct-hf", device_map="cuda", max_new_tokens=4096)
model = OpenAIServerModel("meta-llama--CodeLlama-34b-Instruct-hf", api_base=api_base, api_key="None", max_tokens=4096)

@laurentvv
Copy link

With the OpenAIModel, I was able to work around this by passing max_tokens to it (specifically 4096). With TransformersModel I used max_new_tokens.

Thanks for your feedback, can you share the code that works, always good to have this type of information?

@dadaphl
Copy link

dadaphl commented Jan 22, 2025

I'm also getting this errors. I tried with LiteLLMMode connected to ollama and tabbyAPI with different quantisations of Qwen2.5-Instruct and DeepseekR1. Judging from observing the "Trace Details" in Pheonix the in system message ("You are an expert assistant who can solve any task using code blobs...") doesn't seem to be sent to the LLM. At least the answer makes me believe that.

Example:

system:
...
created variables or imported modules, these will all persist.
10. Don't give up! You're in charge of solving the task, not providing directions to solve it.

Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.

user:

New task:
How many seconds would it take for a leopard at full speed to run through Pont des Arts?

Assistant:

To answer this question, we need to make some assumptions and estimates:

1. Length of Pont des Arts: Approximately 200 meters (656 feet)
2. Average running speed of a leopard: About 30-40 kilometers per hour (18.6-24.9 miles per hour)

First, let's convert the length from meters to kilometers:
200 meters = 0.2 kilometers

Next, we'll calculate the time it would take for a leopard to run this distance at an average speed of 35 km/h:

Time = Distance / Speed
     = 0.2 km / 35 km/h
     ≈ 0.005714 hours

To convert this time into seconds:
0.005714 hours × (60 minutes/hour) × (60 seconds/minute)
= 20 seconds (rounded to the nearest second)

Therefore, it would take a leopard approximately 20 seconds to run through Pont des Arts at full speed.

It's worth noting that this is an estimate based on average speeds and real-world factors may affect actual performance.

When i copy paste the system message with the task in ollama directly i get a proper response

... ', 'collections']
...   9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
...   10. Don't give up! You're in charge of solving the task, not providing directions to solve it.
... 
...   Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
... 
... Task "How many seconds would it take for a leopard at full speed to run through Pont des Arts?"
... 
Thought: To determine how many seconds it would take for a leopard at full speed to run through the Pont des Arts, we need to know the length of the bridge and the maximum speed of a 
leopard.

1. Use `web_search` to find the length of the Pont des Arts.
2. Use `web_search` to find the maximum speed of a leopard.
3. Convert the length of the bridge to meters if it's not already in meters.
4. Calculate the time using the formula: time = distance / speed.

Code:
```py
bridge_length_result = web_search(query="length of Pont des Arts")
print("Bridge length result:", bridge_length_result)

leopard_speed_result = web_search(query="maximum speed of a leopard")
print("Leopard speed result:", leopard_speed_result)
```<end_code>

@Rabbonos
Copy link

Rabbonos commented Jan 24, 2025

I had the same issue when I asked my 'manager agent' (codeagent) to provide output in specific formats, such as JSON. I guess it broke the LLM's output parsing—my minor oversight.

I guess the error happens here:

def parse_code_blobs(code_blob: str)->str:

"""Parses the LLM's output to get any code blob inside. Will return the code directly if it's code."""
pattern = r"```(?:py|python)?\n(.*?)\n```"
matches = re.findall(pattern, code_blob, re.DOTALL)
if len(matches) == 0:
    try:  # Maybe the LLM outputted a code blob directly
        ast.parse(code_blob)
        return code_blob
    except SyntaxError:
        pass

    if "final" in code_blob and "answer" in code_blob:
        raise ValueError(
            f"""

The code blob is invalid, because the regex pattern {pattern} was not found in {code_blob=}. It seems like you're trying to return the final answer, you can do it as follows:
Code:

final_answer("YOUR FINAL ANSWER HERE")
```<end_code>""".strip()
            )
        raise ValueError(
            f"""
The code blob is invalid, because the regex pattern {pattern} was not found in {code_blob=}. Make sure to include code with the correct pattern, for instance:
Thoughts: Your thoughts
Code:
```py
# Your python code here
```<end_code>""".strip()
        )
    return "\n\n".join(match.strip() for match in matches)


So from what I understand, the output of my LiteLLModel (gpt4o-mini) 'manager agent' / codeagent  is wrong.

Which could be due to how I prompted it.  Asking only for JSON, but it should have give something like :

```python 
 ... code here ... (that gives my json)
?

Or I am wrong ¯⁠\⁠_⁠༼⁠ل͜⁠༽⁠_⁠/⁠¯

@dadaphl
Copy link

dadaphl commented Jan 24, 2025

I did some debugging and it seems that the system message is successfully forwarded to LiteLLM. I copy pasted the system message with the user message and tried calling the ollama api endpoint directly with the massage array with system and user message and the same result. The system instruction seems to have no effect. I think it's a ollama issue.

@Trevoke
Copy link

Trevoke commented Jan 24, 2025

If it goes to LiteLLM then either it's an issue with how LiteLLM forwards it to Ollama or it's an issue with Ollama. I definitely have the same problem :(

@dadaphl
Copy link

dadaphl commented Jan 24, 2025

Here you can test if for yourself. If you send the messages to the ollama api the response looks like message with the system role didn't have any effect on the output.

import requests
from rich.pretty import pprint

payload = {
    "model": "qwen2.5-coder:7b",
    "stream": False,
    "messages": [
        {
            'role': 'system',
            'content': 'You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.\nTo do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.\nTo solve the task, you must plan forward to proceed in a series of steps, in a cycle of \'Thought:\', \'Code:\', and \'Observation:\' sequences.\n\nAt each step, in the \'Thought:\' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.\nThen in the \'Code:\' sequence, you should write the code in simple Python. The code sequence must end with \'<end_code>\' sequence.\nDuring each intermediate step, you can use \'print()\' to save whatever important information you will then need.\nThese print outputs will then appear in the \'Observation:\' field, which will be available as input for the next step.\nIn the end you have to return a final answer using the `final_answer` tool.\n\nHere are a few examples using notional tools:\n---\nTask: "Generate an image of the oldest person in this document."\n\nThought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\nCode:\n```py\nanswer = document_qa(document=document, question="Who is the oldest person mentioned?")\nprint(answer)\n```<end_code>\nObservation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."\n\nThought: I will now generate an image showcasing the oldest person.\nCode:\n```py\nimage = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")\nfinal_answer(image)\n```<end_code>\n\n---\nTask: "What is the result of the following operation: 5 + 3 + 1294.678?"\n\nThought: Iwill use python code to compute the result of the operation and then return the final answer using the `final_answer` tool\nCode:\n```py\nresult = 5 + 3 + 1294.678\nfinal_answer(result)\n```<end_code>\n\n---\nTask:\n"Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French.\nYou have been provided with these additional arguments, that you can accessusing the keys as variables in your python code:\n{\'question\': \'Quel est l\'animal sur l\'image?\', \'image\': \'path/to/image.jpg\'}"\n\nThought: I will use the following tools: `translator` to translatethe question into English and then `image_qa` to answer the question on the input image.\nCode:\n```py\ntranslated_question = translator(question=question, src_lang="French", tgt_lang="English")\nprint(f"Thetranslated question is {translated_question}.")\nanswer = image_qa(image=image, question=translated_question)\nfinal_answer(f"The answer is {answer}")\n```<end_code>\n\n---\nTask:\nIn a 1979 interview, Stanislaus Ulam discusses with Martin Sherwin about other great physicists of his time, including Oppenheimer.\nWhat does he say was the consequence of Einstein learning too much math on his creativity, in one word?\n\nThought: I need to find and read the 1979 interview of Stanislaus Ulam with Martin Sherwin.\nCode:\n```py\npages = search(query="1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein")\nprint(pages)\n```<end_code>\nObservation:\nNo result found for query "1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein".\n\nThought: The query was maybe too restrictive and did not find any results. Let\'s try again with a broader query.\nCode:\n```py\npages = search(query="1979 interview Stanislaus Ulam")\nprint(pages)\n```<end_code>\nObservation:\nFound 6 pages:\n[Stanislaus Ulam 1979 interview](https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/)\n\n[Ulam discusses Manhattan Project](https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/)\n\n(truncated)\n\nThought: I will read the first 2 pages to know more.\nCode:\n```py\nfor url in ["https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/", "https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/"]:\n    whole_page = visit_webpage(url)\n    print(whole_page)\n    print("\n" + "="*80 + "\n")  # Print separator between pages\n```<end_code>\nObservation:\nManhattan Project Locations:\nLos Alamos, NM\nStanislaus Ulam was a Polish-American mathematician. He worked on the Manhattan Project at Los Alamos and later helped design the hydrogen bomb. In this interview, he discusses his work at\n(truncated)\n\nThought: I now have the final answer: from the webpages visited, Stanislaus Ulam says of Einstein: "He learned too much mathematics and sort of diminished, it seems to me personally, it seems to me his purely physics creativity." Let\'s answer in one word.\nCode:\n```py\nfinal_answer("diminished")\n```<end_code>\n\n---\nTask: "Which city has the highest population: Guangzhou or Shanghai?"\n\nThought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\nCode:\n```py\nfor city in ["Guangzhou", "Shanghai"]:\n print(f"Population {city}:", search(f"{city} population")\n```<end_code>\nObservation:\nPopulation Guangzhou: [\'Guangzhou has a population of 15 million inhabitants as of 2021.\']\nPopulation Shanghai: \'26 million (2019)\'\n\nThought: Now I know that Shanghai has the highest population.\nCode:\n```py\nfinal_answer("Shanghai")\n```<end_code>\n\n---\nTask: "What is the current age of the pope, raised to the power 0.36?"\n\nThought: I will use the tool `wiki` to get the age of the pope, and confirm that with a web search.\nCode:\n```py\npope_age_wiki = wiki(query="current pope age")\nprint("Pope age as per wikipedia:", pope_age_wiki)\npope_age_search = web_search(query="current pope age")\nprint("Pope age as per google search:", pope_age_search)\n```<end_code>\nObservation:\nPope age: "The pope Francis is currently 88 years old."\n\nThought: I know that the pope is 88 years old. Let\'s compute the result using python code.\nCode:\n```py\npope_current_age = 88 ** 0.36\nfinal_answer(pope_current_age)\n```<end_code>\n\nAbove example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you only have access to these tools:\n\n\n- web_search: Performs aduckduckgo web search based on your query (think a Google search) then returns the top search results.\n    Takes inputs: {\'query\': {\'type\': \'string\', \'description\': \'The search query to perform.\'}}\n    Returns an output of type: string\n\n- final_answer: Provides a final answer to the given problem.\n    Takes inputs: {\'answer\': {\'type\': \'any\', \'description\': \'The final answer to the problem\'}}\n    Returns an output of type: any\n\n\n\nHere are the rules you should always follow to solve your task:\n1. Always provide a \'Thought:\' sequence, and a \'Code:\n```py\' sequence ending with \'```<end_code>\' sequence, else you will fail.\n2. Use only variables that you have defined!\n3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in \'answer = wiki({\'query\': "What is the place where James Bond lives?"})\', but use the arguments directly as in \'answer = wiki(query="What is the place where James Bond lives?")\'.\n4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.\n5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.\n6. Don\'t name any new variable with the same name as a tool: for instance don\'t name a variable \'final_answer\'.\n7. Never create any notional variables in our code, as having these in your logs will derail youfrom the true variables.\n8. You can use imports in your code, but only from the following list of modules: [\'collections\', \'datetime\', \'statistics\', \'time\', \'random\', \'queue\', \'stat\', \'re\', \'unicodedata\', \'math\', \'itertools\']\n9. The state persists between code executions: so if in one step you\'ve created variables or imported modules, these will all persist.\n10. Don\'t give up! You\'re in charge of solving the task, not providing directions to solve it.\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.'
        },
        {
            "role": "user",
            "content": "New task:\nHow many seconds would it take for a leopard at full speed to run through Pont des Arts?"
        }
    ]
}

url = "http://localhost:11434/api/chat"

response = requests.post(url, json=payload)
pprint(response.json())
(smolagenttest) ➜  smolagenttest python test.py
{
    'model': 'qwen2.5-coder:7b',
    'created_at': '2025-01-24T21:45:20.220716427Z',
    'message': {
        'role': 'assistant',
        'content': 'To answer this question, we need to know the length of Pont des Arts and the maximum running speed of a leopard.\nAccording to Wikipedia, the Pont des Arts is a pedestrian bridge in Paris, France, that spans the Seine River. The total length of the bridge is 268 meters (879 feet).\nThe maximum running speed of a leopard can vary depending on the species and individual animal, but it is generally estimated to be around 50-70 kilometers per hour (31-44 miles per hour) or about 13.9-19.4 meters per second.\nTo calculate how many seconds it would take for a leopard at full speed to run through Pont des Arts, we can use the formula:\nTime = Distance / Speed\nPlugging in the values we know, we get:\nTime = 268 meters / 13.9 meters per second ≈ 19.27 seconds\nTherefore, it would take approximately 19.3 seconds for a leopard at full speed to run through Pont des Arts.'
     ,
    'done_reason': 'stop',
    'done': True,
    'total_duration': 6499757341,
    'load_duration': 2210663369,
    'prompt_eval_count': 30,
    'prompt_eval_duration': 105000000,
    'eval_count': 222,
    'eval_duration': 3758000000
}

@nlippke
Copy link

nlippke commented Jan 28, 2025

@dadaphl ollama ignores the system prompt if context is too small, mentioned here

If you set the context, it should produce a better answer:

payload = {
    "model": "llama3.2",
    "stream": False,
    "options": {
        "num_ctx": 4096
    },
    "messages": [...]
}

@dadaphl
Copy link

dadaphl commented Jan 30, 2025

Oh wow @nlippke that is the solution. I would have never thought about having to adjust the ctx size already for the basic example. Thank you so much

So here would be a full working example for local ollama.

from smolagents import (
    CodeAgent,
    DuckDuckGoSearchTool,
    LiteLLMModel,
)

model_id = "ollama/qwen2.5-coder:32b"
model = LiteLLMModel(
    model_id=model_id,
    num_ctx= 4096*4,
    #temperature=0.2,
    #max_tokens=30000
)

agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Invalid type Message for attribute 'output.value' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
 ─ Executing this code: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  # Calculate the average time in seconds
  average_time_seconds = (time_min_seconds + time_max_seconds) / 2

  final_answer(average_time_seconds)
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Out - Final answer: 7.8046558387824865
[Step 2: Duration 99.81 seconds| Input tokens: 12,309 | Output tokens: 845]
(smolagenttest) ➜  smolagenttest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants