Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

Open
marcusschiesser opened this issue Aug 19, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@marcusschiesser
Copy link
Collaborator

I am opening this ticket to gather all issues related to bundling the WASM from https://github.com/dqbd/tiktoken:

  1. Using AWS Nodejs serverless project, see Node Serverless deployment fails due to bundling issue #1110 (comment)
  2. Using NextJS deploying on Vercel, see Error: Missing tiktoken_bg.wasm create-llama#164 (was fixed by copying the WASM file; see https://github.com/run-llama/create-llama/pull/201/files)

If you encounter this issue, please post your setup and configuration here.

@LeonhardZehetgruber
Copy link

I am encountering this issue when trying to integrate llamaindex into my Obsidian plugin. The build output for the plugin is a bundled main.js file.

package.json (the relevant part):

{
	"type": "module",
	"scripts": {
		"dev": "node esbuild.config.mjs"
	},
	"dependencies": {
		"llamaindex": "0.5.20"
	}
}

esbuild.config.mjs:

import esbuild from "esbuild";
import process from "node:process";
import builtins from "builtin-modules";

const context = await esbuild.context({
	entryPoints: { main: "src/main.ts" },
	bundle: true,
	platform: "node",
	external: [
		"obsidian",
		"electron",
		"sharp",
		"onnxruntime-node",
		"./xhr-sync-worker.js",
		...builtins],
	mainFields: ["browser", "module", "main"],
	conditions: ["browser"],
	format: "cjs",
	target: "es2022",
	logLevel: "info",
	treeShaking: true,
	outdir: "."
});

await context.rebuild();
process.exit(0);

tsconfig.json:

{
	"compilerOptions": {
		"baseUrl": "./src",
		"target": "es2022",
		"module": "ESNext",
		"moduleResolution": "bundler",
		"esModuleInterop": true,
		"skipLibCheck": true,
		"types": [
			"node",
			"jest"
		],
		"lib": [
			"DOM",
			"ES5",
			"ES6",
			"ES7",
			"ES2021",
			"ES2022"
		]
	},
	"include": [
		"**/*.ts"
	]
}

If I now use the following in my main.ts:

import { HuggingFaceEmbedding, Settings } from 'llamaindex';

Settings.embedModel = new HuggingFaceEmbedding({
	modelType: 'nomic-ai/nomic-embed-text-v1.5',
	quantized: false
});

I get the error Error: Missing tiktoken_bg.wasm at node_modules/tiktoken/tiktoken.cjs in the developer console.

@AndreMaz
Copy link
Contributor

AndreMaz commented Sep 25, 2024

Just in case someone also faces the same issue. This is how I solved the issue

My next.config.mjs

import path from "path";
import { fileURLToPath } from "url";
import _jiti from "jiti";

import { withLlamaIndex } from "@web/chatbot/next";

const jiti = _jiti(fileURLToPath(import.meta.url));

// Import env files to validate at build time. Use jiti so we can load .ts files in here.
jiti("./src/env");

const isStaticExport = "false";

// Get __dirname equivalent for ES modules
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

/**
 * @type {import("next").NextConfig}
 */
const nextConfig = {
  basePath: process.env.NEXT_PUBLIC_BASE_PATH,
  serverRuntimeConfig: {
    PROJECT_ROOT: __dirname,
  },
  env: {
    BUILD_STATIC_EXPORT: isStaticExport,
  },
  // Trailing slashes must be disabled for Next Auth callback endpoint to work
  // https://stackoverflow.com/a/78348528
  trailingSlash: false,
  modularizeImports: {
    "@mui/icons-material": {
      transform: "@mui/icons-material/{{member}}",
    },
    "@mui/material": {
      transform: "@mui/material/{{member}}",
    },
    "@mui/lab": {
      transform: "@mui/lab/{{member}}",
    },
  },
  webpack(config) {
    config.module.rules.push({
      test: /\.svg$/,
      use: ["@svgr/webpack"],
    });

    // To allow chatbot to work
    // Extracted from: https://github.com/neondatabase/examples/blob/main/ai/llamaindex/rag-nextjs/next.config.mjs
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      "onnxruntime-node$": false,
    };

    // From: https://github.com/dqbd/tiktoken?tab=readme-ov-file#nextjs
    config.experiments = {
      asyncWebAssembly: true,
      layers: true,
    };

    return config;
  },
  ...(isStaticExport === "true" && {
    output: "export",
  }),

  experimental: {
    outputFileTracingIncludes: {
      "/*": ["./cache/**/*"],
      "/api/**/*": ["./node_modules/**/*.wasm"],
    },
    serverComponentsExternalPackages: ["tiktoken", "onnxruntime-node"],
  },

  /** Enables hot reloading for local packages without a build step */
  transpilePackages: [
    "@web/api",
    "@web/auth",
    "@web/db",
    "@web/ui",
    "@web/validators",
    "@web/services",
    "@web/utils",
    "@web/logger",
    "@web/certs",
    "@web/chatbot",
  ],
  /** We already do linting and typechecking as separate tasks in CI */
  eslint: { ignoreDuringBuilds: true },
  typescript: { ignoreBuildErrors: true },
};

const withLlamaIndexConfig = withLlamaIndex(nextConfig);

export default withLlamaIndexConfig;

In my case everything related to llamaindex is at package @web/chatbot. This is why even the withLlamaIndex is being imported from @web/chatbot/next

Here's how my package.json at @web/chatbot looks like:

{
  "name": "@web/chatbot",
  "private": true,
  "version": "0.1.0",
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
    "./next": "./src/with-lama-index.mjs"
  },
  "license": "MIT",
  "scripts": {
    "clean": "rm -rf .turbo node_modules",
    "format": "prettier --check . --ignore-path ../../.gitignore --ignore-path ../../.prettierignore",
    "lint": "eslint .",
    "typecheck": "tsc --emitDeclarationOnly"
  },
  "devDependencies": {
    "@web/eslint-config": "workspace:*",
    "@web/prettier-config": "workspace:*",
    "@web/tsconfig": "workspace:*",
    "@web/utils": "workspace:*",
    "eslint": "catalog:",
    "prettier": "catalog:",
    "typescript": "catalog:"
  },
  "prettier": "@web/prettier-config",
  "dependencies": {
    "@web/logger": "workspace:*",
    "@t3-oss/env-nextjs": "catalog:",
    "js-tiktoken": "^1.0.14",
    "llamaindex": "catalog:",
    "pg": "^8.13.0",
    "tiktoken": "^1.0.16"
  }
}

For reference: The next.config.mjs and my repo struct is based on create-t3-turbo repo

For more context check #1226

@himself65 himself65 added the bug Something isn't working label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants