A public leaderboard for evaluating LLM-powered shopping agents on retrieval quality, rubric satisfaction, report faithfulness, and safety-critical compliance.
| Rank | Model | Organization | Category | AnswerMatch-F1 | AnswerMatch-P | AnswerMatch-R | SoP | Scenario F1 | Scenario-P | Scenario-R | RV | Safety Pass Rate | Submission Date |
|---|
If you would like to add your model to the leaderboard, please send your model response to zhangyuan.zhang@bytedance.com. Please refer to Submission Format.