Interactive chat mode with separate performance tracking#209
Merged
Conversation
Improvements to console UX: - Redesigned help command with ASCII box formatting - Added 'help ai' for AI-specific command reference - Added 'help all' for comprehensive command list - Added 'status' command for system/AI status overview - Added 'clear'/'cls' for terminal clear (ANSI escape) - Added 'version'/'ver' to show kernel version Chat command improvements: - Better usage examples when called without message - Progress messages during model/tokenizer/engine init - Improved output formatting with labeled sections - Better error messages with actionable suggestions Unknown command handling: - Suggests similar commands for common typos - Points users to 'help' for available commands
New commands: - 'talk' - Interactive chat session with continuous conversation loop - Auto-initializes model, tokenizer, and inference engine on start - Shows session banner and usage hints - Type 'perf' inline to see session stats - Type 'exit'/'quit'/'q' to leave with session summary - Tracks tokens/time per message and session totals - 'perf' - Display chat performance metrics separately - Last message: prompt tokens, generated tokens, time, throughput - Session totals: messages, total tokens, average throughput - Keeps benchmarks out of the chat flow Performance tracking: - Added chat_perf_t struct to track timing - rdtsc/cntvct_el0 cycle counting for accurate timing - Session-level aggregation for average throughput Chat command updated: - Cleaner output (just the response, no extra formatting) - Records timing but doesn't display it - Points users to 'talk' for interactive mode
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a dedicated interactive chat mode (
talk) and separates performance metrics from chat output.New Commands
talk- Interactive Chat ModeYou>/AI>promptsperfcommand for quick stats during chatperf- Performance MetricsConsole UX Improvements
helpcommand with ASCII box formattinghelp aifor AI-specific command referencestatus,clear/cls,version/vercommandsImplementation
chat_perf_tstruct for timing tracking