Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #20

Open
6 of 31 tasks
ydennisy opened this issue Apr 7, 2024 · 0 comments
Open
6 of 31 tasks

Roadmap #20

ydennisy opened this issue Apr 7, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@ydennisy
Copy link
Owner

ydennisy commented Apr 7, 2024

Roadmap

A simple catch all issue for roadmap items.

Private roadmap with secret features is here.

Todo

Features:

  • Receive links from email sent to a unique per user alias.
  • Improve the way hackernews links are handled: make two KG entries, one for the target link and one for the discussion thread.
  • Start to connect documents: this can be done by extracting entities and created an edge between docs based on the number of shared entities. One complication is to figure out a normalisation strategy, so large docs are not more "strongly" linked to all other docs.
  • Improve the way sources are shown in the UI after the ask, mainly ensure this is consistent in terms of formatting.
  • Index many more sites and in the process fix bugs related to how sites are crawled, embedded and indexed.
  • Work out the admin features of re-processing existing URLs, to allow for improvements to things like chunking strategies. To facilitate a job to re-process (migrate) indexed URLs - we could add a version to each URL, to show which version of the system processed it. Another idea from this, is to treat these changes as migrations, by following the same strategies as done with SQL, i.e add a new chunking strategy to a folder and the system recognises and runs it.
  • Add tests! Especially for URL indexing, and ensuring failed URLs do not take down the rest in the batch.
  • use the summary and not the whole document for embedding of the text node.
  • Need to properly handle tables and formulae, both on input and output (generation).
  • Show to the user what parts of the contexts found his answer.
  • Another use case is "given all my documents, find a document in which x is mentioned".
  • Upload documents from your local machine.

Technical

  • Try using https://docs.python.org/3/library/zipapp.html to zip the app and run inside of docker, to speed up cold start times on GCP.
  • Implement separate indexing service and use a smart chunking approach.
  • Improve the LLM prompting, use well known patterns.
  • use litellm for accessing multiple LLMs with a single API.
  • Serve the homepage as a static page: once this is figured out we can also start to serve more pages as static pages - for example all document node pages. Another idea is to expose past Q&A as static docs.
  • Add IP rate limiting, and swap from 429 to another status code to differentiate GCP cold start from too many requests from the same client or tokens exhausted.
  • Need a proper way to handle long text documents, right now the setup is very inefficient, and I have added hacky size reduction by chopping some words.
  • Explore use of pex to make start up times quicker.

UX

  • Implement an onboarding page with all keys actions explained.
  • Add tabs to popular document types; wikipedia, arxiv, HN, etc
  • Show a loader until the app is warmed up, or improve cold start.

Bugs

  • If auth fails on the BE log the user out on the UI, push to login page.

Styling

Done

  • Consider adding a feed page showing all recently indexed documents, their state etc.
  • Show a "nothing indexed" banner when first arriving at index page. #38
  • Move the indexation process to run in the background, it currently takes too long, users should not need to wait.
  • Allow sharing search links with the search inside, to share results with others.
  • create AI summaries on document indexing vs retrieval.
  • Allow chat with document in node details view, this could look like moving back to ask page with the document inserted into the context.
@ydennisy ydennisy added the enhancement New feature or request label Apr 7, 2024
@ydennisy ydennisy self-assigned this Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant