How to Prompt the Data Processing Aspect of the Agent More Robust

It is realized by process_table_query, here are ideas to make it much more capable/robust, without bloating the code too much:

1. Smarter routing before hitting the LLM

Right now every table query goes straight to “LLM generates pandas code”. You can get a lot more robustness by handling simple patterns yourself first.

1.1. Direct “show me table X” shortcuts, some ask is just to display dataframes in steps, don’t need to resort to LLM and coding. return {"success": True, "dataframe": df, "description": f"Table: {table_name}\nRows: {len(df)}", "table_name": table_name}

2. Make the LLM prompt more structured

3. Validate the generated code’s effects

4. Column-awareness and error messages

You can precompute some metadata:

  • numeric_cols = df.select_dtypes(include=[np.number]).columns
  • categorical_cols (non-numeric)

5. Better handling of statistics vs table queries

already use is_stats/result. You can make it safer:

6. Support lightweight manual filters without LLM (optional)

 reduce dependence on LLM and increase robustness:

  • Recognize a few very common patterns with simple regexes and handle them locally:
    • “show stocks where <column> > <number>
    • “show rows where <column> = <value>
    • “sort by <column> descending”
  • For those, you can:
    • Parse the column name and value.
    • Validate the column exists.
    • Run a small df[...] / df.sort_values(...) directly.

Only fall back to LLM when you don’t recognize the pattern. This is a bigger addition, but we can start with 1–2 patterns that you commonly use.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.