Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual file system support in TSServer #47600

Open
mjbvz opened this issue Jan 25, 2022 · 13 comments
Open

Virtual file system support in TSServer #47600

mjbvz opened this issue Jan 25, 2022 · 13 comments

Comments

@mjbvz
Copy link
Contributor

@mjbvz mjbvz commented Jan 25, 2022

This proposal discusses support for a virtual file system (VFS) to TSServer. The contents of a virtual file system would be controlled by a client. Using virtual file systems, we believe we can deliver advanced features such as cross-file IntelliSense on vscode.dev and github.dev.

Context

The TypeScript server can currently work with two types of files: those on-disk and those in-memory (indicated by opening the file with a ^ prefix on the path). For the purposes of this discussion, on-disk files are files that the TSServer can independently read using nodejs file system apis, while the contents of in-memory files must always be synchronized with TSServer by a client.

Many IntelliSense features are only possible for on-disk files. This includes resolving imports across files, looking up typings, and constructing projects from a jsconfig or tsconfig. In all of these cases, TS implements these features by walking directories and reading files from the disk. None of this is currently possible for in-memory files.

However on VS Code, users are increasingly using virtual workspaces that TSServer cannot read directly. On GitHub.dev and vscode.dev for example, the workspace is provided by a file system provider that reads the workspace contents directly from GitHub or other code storage services. While we can synchronize the opened editors over the TS Server, IntelliSense support for them is still quite limited.

Brining proper virtual file system support to TSServer seems like best solution to enable a desktop like IntelliSense experience on GitHub.dev and vscode.dev

Motivating use cases

Cross-file IntelliSense on web

When a user opens a github.dev and vscode.dev workspace, we would like to provide cross-file IntelliSense by resolving imports. Eventually we would even like to provide project IntelliSense by parsing tsconfig/jsconfig files.

To implement this, we need to synchronize the workspace contents over to the TS Server so that the server can read files besides the ones that are currently opened.

Support for virtual workspaces on desktop

With desktop versions of VS Code, users can also open virtual workspaces. Working with JS/TS files in these virtual workspaces should be just like working with with JS/TS files on-disk.

The requirements to implement this are almost identical to the web case listed above.

Automatic type Acquisition (ATA) on web

When a user opens a JS/TS file from github.dev or vscode.dev, we would like to automatically download typings to provide better IntelliSense.

To implement this, we need a way to tell TS about typings files and where these d.ts files live within the project. Again, this is not possible today but we believe could be implemented using virtual file systems

Additional goals

  • Do not introduce VS Code specific concepts even though VS Code will be the largest consumer.

  • Do not requiring a significant rewrite of the entire compiler/server. For example, server is currently synchronous so our proposal must not require converting it to be asynchronous.

Out of scope

This proposal only discusses virtual file system support. We will discuss the specifics of the individual use cases above in separate issues.

Proposal

For the purposes of this proposal, a virtual file system (VFS) is a in-memory representation of a file system. The structure and contents of the VFS are provided to TSServer by the client. TSServer will use its in-memory VFS to implement file system operations, such as file reads and directory walks. By routing these operations through the VFS, we should be able to implement features such as cross-file IntelliSense without having to rewrite the entire server.

Implementing virtual file system support will require:

  1. Establishing a protocol clients can use to work with a VFS.
  2. Actually implementing VFS support inside TS Server.

This proposal focuses only on the protocol part of the proposal. I don't have enough knowledge of TSServer's internals to come up with a plan for actually implementing it.

Protocol

updateFileSystem

updateFileSystem is a new protocol request that clients use to update the contents of a VFS. It is inspired by updateOpen and would take a list of created, deleted, and updated files on the VFS.

Virtual file systems each have a unique identifier. This identifier is used in calls to updateFileSystem and also will be used to open a file against a specific VFS.

Here's an example request for a memfs VFS:

updateFileSystem {
    fileSystem: 'memfs',
    created: [
        { path: "/workspace/index.js", contents: "import * as abc from './sub/abc'" },
        { path: "/workspace/src/abc.js", contents: "export const abc = 123;" },
        { path: "/workspace/test/xyz.test.ts", contents: "..." },
    ],
    deleted: [],
    updated: [
        { path: "/workspace/test/xyz.test.ts", contents: "..." }
    ]
}

The above proposal takes a flat list of files similar to update opened. If we think it would be more convenient, we could instead take a tree-like structure.

When TSServer receives an updateFileSystem request, it must update its internal in-memory representation of this VFS. However it should not yet start processing any of these files.

Open file on a given VFS

After initializing a VFS, clients also need to then open a specific file on the VFS. For this, I propose we introduce a new style of path that can be used to talk about resources on a VFS:

memfs:/workspace/path/file.ts

This style of path is inspired by VS Code's uris. We would need to add support for them to all places in the protocol where we take or return a path.

Example

Let's walk through how VS Code could implement workspace-wide IntelliSense on vscode.dev using this proposal.

  1. VS Code downloads and caches the entire contents of the workspace

    This is already implemented on the VS code side.

  2. VS Code sends a static copy of the workspace over to TS Server using updateFileSystem

    updateFileSystem {
        fileSystem: 'memfs',
        opened: [
            { path: "/workspace/index.js", contents: "import * as abc from './sub/abc'" },
            { path: "/workspace/src/abc.js", contents: "export const abc = 123;" },
            { path: "/workspace/test/xyz.test.ts", contents: "..." },
        ]
    }
    
  3. TS Server receives the file system contents and sets up its own representation of the virtual file system.

    With the above request, TS server would construct an in-memory representation of the file system that looks like:

    workspace/
        index.js
        src/
            index.js
        test/
            xyz.test.ts
    

    At this point, TS Server should not yet process any of these files or treat them part of a typescript project. The files are only held in-memory and can be read later

  4. VS Code opens index.js on the virtual file system

    Let's assume this happens because the user clicked on index.js to view it.

    At this point, VS Code uses a normal updateOpen call to tell TS server that the user has opened a JS or TS file. This file is part of the virtual file system.

    updateOpen {
        openFiles: [
            {  file: "memfs:/workspace/index.js", contents: "import * as abc from './sub/abc';" }
        ]
    }
    
  5. TS constructs project representation

    After index.ts is opened, TS processes it and starts building up a representation of the TS project. In this case, it sees the import ./sub/abc in index.ts and attempts to resolve the import. Using the virtual file system and opened files, the server first checks if the file memfs:/workspace/sub/abc.ts exists. Here all file system operations need to be routed through the virtual file system instead of trying to go to disk.

  6. User requests go to definition on a reference to abc in index.js

    Here VS Code would send a definitionAndBoundSpan request:

    definitionAndBoundSpan {
        file: "memfs:/workspace/index.js",
        line: 1,
        offset: 10
    }
    
  7. The server uses the VFS to respond

    definitionAndBoundSpanResponse {
        definitions: [
            { file: "memfs:/workspace/src/abc.ts", ....}
        ]
    }
    

Alternatives considered

Delegate file system operations to the client

Instead of eagerly syncing the VFS over to TSServer, we could instead delegate individual file system operations back to the client.

This is likely not possible without a significant rewrite of the server. The server expects file system operations to be synchronous, and there is no good way to synchronously communicate from the TSServer worker process back to main VS Code extension host process. Even if we could implement synchronous calls, doing so would not be ideal and would result in a large number of messages getting passed back and forth between the client and server.

@mjbvz
Copy link
Contributor Author

@mjbvz mjbvz commented Jan 25, 2022

@andrewbranch Here are the steps I've been using to test local ts changes on a local web build of VS Code:

  1. Build a copy of TS locally

  2. In vscode, comment out the CopyPlugin section of this build file (this is the rule that runs the minifier):

    https://github.com/microsoft/vscode/blob/cde5781978134c0091d28a4f11b45b2f08412b4f/extensions/typescript-language-features/extension-browser.webpack.config.js#L63

  3. Build VS Code for web: yarn watch-web

  4. Link your built TS Server in place of the VScode one:

    # In the VSCode repo
    cd extensions/typescript-language-features/dist/browser/typescript
    ln -s ~/projects/typescript/built/local/tsserver.js tsserver.web.js
    
  5. Now run for web: ./scripts/code-web.sh. This should open http://localhost:8080

@amcasey
Copy link
Member

@amcasey amcasey commented Jan 26, 2022

I'm not sure I understand why a VFS would need an identifier - do we anticipate having multiple or mixing VFS and actual FS access? Why wouldn't this command just say "this is your view of the FS until further notice"?

@amcasey
Copy link
Member

@amcasey amcasey commented Jan 26, 2022

Possibly relevant: I believe the tests already virtualize FS access.

@amcasey
Copy link
Member

@amcasey amcasey commented Jan 26, 2022

Do we have a sense of how much slower this would be than a more specialized API saying "this tarball if your FS"? If it's substantial, we might want a "payload kind" property.

@mjbvz
Copy link
Contributor Author

@mjbvz mjbvz commented Jan 26, 2022

I'm not sure I understand why a VFS would need an identifier - do we anticipate having multiple or mixing VFS and actual FS access? Why wouldn't this command just say "this is your view of the FS until further notice"?

On the basic web, I think everything will be on the VFS. But on desktop VS Code, you can end up in situations where some files exist on disk and some exist on a virtual file system. This could happen if you create a workspace that has one folder from disk and one from somewhere like GitHub or one drive for example

Do we have a sense of how much slower this would be than a more specialized API saying "this tarball if your FS"? If it's substantial, we might want a "payload kind" property.

My suspicion is that in most workspaces, the number of js/ts files we need to send over will be pretty small (< 100). For something like the typescript project, we would have to send a lot of files though. I don't expect the data transfer to be the main bottleneck but we will need to test this (we could also try to optimize it using transferables if needed)

We can also have a cap of the number of files we send if we do run into issues

@jrieken
Copy link
Member

@jrieken jrieken commented Jan 26, 2022

I don't expect the data transfer to be the main bottleneck but we will need to test this (we could also try to optimize it using transferables if needed)

A qualified guestimate is "F1 > Measure Extension Host Latency". Take it with a grain of salt but on vscode.dev with Safari I see up/download speeds ~2000Mbs.

I might also be noteworthy that a virtual file system is the more generic solution, e.g. vscode.dev also supports ADO and there no tarball is available. A VFS can abstract that away and give room for other optimisations like a browser-side git clone etc

@DanielRosenwasser
Copy link
Member

@DanielRosenwasser DanielRosenwasser commented Jan 27, 2022

I'm going to read through the proposal and just plop a bunch of discussion points that I want to work through. Please note it's all constructive, I'm just trying to think through the scenarios!


The above proposal takes a flat list of files similar to update opened. If we think it would be more convenient, we could instead take a tree-like structure.

One of the concerns here is deciding what is explicit and implicit. We should get a sense of when it is and isn't useful to make these distinctions. For example, you create a file called /foo/bar/baz.ts. That presumably means that there's a root directory at /, it contains a folder called foo, that contains a folder called bar, and that contains a file called baz.ts

We can "do the right thing", but off the bat this can't support things like symlinks, other file-like abstractions, weird niche systems that could allow files and folders with the same name, etc. The more we see vscode.dev doing in the future, the more this sort of stuff will matter.

I think for me, the biggest concern is symlinks. If you're planning on simulating npm install-like scenarios, this will be good to have.


{ path: "/workspace/index.js", contents: "import * as abc from './sub/abc'" },

On this note, if we do go beyond directory and path, each entry having its own tag field instead of a bunch of optional properties like contents would be nice too. This might make it easier to guide the API into the pit of success.

In other words, what I'd hope for with this is that it's a discriminated union type. Maybe I'm going overkill.


updateFileSystem {
    fileSystem: 'memfs',
    opened: [
        { path: "/workspace/index.js", contents: "import * as abc from './sub/abc'" },
        { path: "/workspace/src/abc.js", contents: "export const abc = 123;" },
        { path: "/workspace/test/xyz.test.ts", contents: "..." },
    ]
}

I know this might just be a rough sketch now, but should this be created and not opened?


updateFileSystem {
    fileSystem: 'memfs',

I dunno if @amcasey already asked this, but what else goes in place of memfs?


On the basic web, I think everything will be on the VFS. But on desktop VS Code, you can end up in situations where some files exist on disk and some exist on a virtual file system. This could happen if you create a workspace that has one folder from disk and one from somewhere like GitHub or one drive for example

This leads me into a few questions:

  • Who's the source of truth for TSServer? Does memfs always take precedence?
  • If we have the file system available, do we ever watch for a file of the same name on disk?
  • Relatedly, can projects contain files from both the VFS and disk? Or are they disjoint?

A qualified guestimate is "F1 > Measure Extension Host Latency". Take it with a grain of salt but on vscode.dev with Safari I see up/download speeds ~2000Mbs.

That's cool! Still, we'd need to be a little careful I assume. Is a worker able to hold all this in memory? We'll need to make sure no client ever says "we just sent you 2 GB, good luck, have fun".

Let me ask a weirder question: why wouldn't VS Code on desktop always talk to TypeScript through a VFS? We already have twice the number of file-watchers that we need.

I think the answer is that language servers contain the necessary logic to filter down files - but if transfer speed is not a problem, can you just toss everything over to TS all the time, regardless of vscode.dev vs. VS Code desktop?

Feels like a crazy suggestion, but maybe it's an assumption worth revisiting!

@DanielRosenwasser
Copy link
Member

@DanielRosenwasser DanielRosenwasser commented Jan 27, 2022

I think the answer is that language servers contain the necessary logic to filter down files - but if transfer speed is not a problem, can you just toss everything over to TS all the time, regardless of vscode.dev vs. VS Code desktop?

Uh, and I guess editors also don't want to traverse and read every single file in a workspace. 🙄🤦‍♂️

But then if that's a concern, how would we get ATA to be speedy?

@andrewbranch
Copy link
Member

@andrewbranch andrewbranch commented Jan 27, 2022

I had similar questions about whether this can/should be enabled alongside a real FS on desktop. That could open up a lot of possibilities for VS Code extensions / LS plugins, but could also become a source of hard-to-diagnose bugs, and could make it difficult for us to update the API if it saw a surge of third-party adoption.

@DanielRosenwasser
Copy link
Member

@DanielRosenwasser DanielRosenwasser commented Jan 27, 2022

memfs:/workspace/index.js

Here's another thing - is this a new URI protocol style we would want to respond to? Would it make more sense for the request to talk about files being in memfs: in the first place?

updateFileSystem {
    created: [
        { path: "memfs:/workspace/index.js", contents: "import * as abc from './sub/abc'" },
        { path: "memfs:/workspace/src/abc.js", contents: "export const abc = 123;" },
        { path: "memfs:/workspace/test/xyz.test.ts", contents: "..." },
    ],
    deleted: [],
    updated: [
        { path: "memfs:/workspace/test/xyz.test.ts", contents: "..." }
    ]
}

@mjbvz
Copy link
Contributor Author

@mjbvz mjbvz commented Jan 27, 2022

Thanks for the feedback. Just a few thoughts on some of these points:


One of the concerns here is deciding what is explicit and implicit.

I think a more explicit, tree-like structure for the file system has a few benefits but may be more difficult to implement/work with. For example, using a tree would let us express empty directories but then we'd also have to tell TS when a directory is deleted. Let's talk about this more


We can "do the right thing", but off the bat this can't support things like symlinks, other file-like abstractions, weird niche systems that could allow files and folders with the same name, etc. The more we see vscode.dev doing in the future, the more this sort of stuff will matter.

In general, I think we should try aligning with VS Code's model of file systems. That way we don't have to worry about aspects such as case-sensitive vs case-insensitive file names

Symlinks are interesting though. I believe (but haven't actually tested this yet) that VS Code's virtual file system supports them. Something to consider for the design even if they are not supported in a v1


This leads me into a few questions:

  1. Who's the source of truth for TSServer? Does memfs always take precedence?
  2. If we have the file system available, do we ever watch for a file of the same name on disk?
  3. Relatedly, can projects contain files from both the VFS and disk? Or are they disjoint?
  1. IMO for TS Server, the source of truth is: first the opened file and then the VFS.

    TSServer should never try updating its model of the VFS based on changes to opened files

  2. No, I don't think this should happen.

    With my thinking, we should never end up in a case where a file exists on both disk and in the VFS. If a client does wish to surface an on-disk file through the VFS, it must always talk to TSServer using the VFS

  3. Yes I think we should allow this, but we'd have to figure out what that actually means for TS projects.

    A use case for VS Code would be: I'm browsing a normal on-disk repo on desktop VS Code and then try opening a JS file that's contained inside of an archive, such as a zip. This file could be surfaced to TSServer using a VFS


Let me ask a weirder question: why wouldn't VS Code on desktop always talk to TypeScript through a VFS? We already have twice the number of file-watchers that we need.

Interesting thought!

With this proposal, the downside is that we have to eagerly sync over the VFS contents. In previous discussions however, we did talk about having TS be able to delegate file system operations back to VS Code. The main problem we ran into is that TS expects file reads to be sync, and there's not a good way to implement that since we'd have to communicate back and forth with the VS Code process


Here's another thing - is this a new URI protocol style we would want to respond to? Would it make more sense for the request to talk about files being in memfs: in the first place?

Yes I like that, although this protocol may need to completely change if we decide we need a more explicit api

@DanielRosenwasser
Copy link
Member

@DanielRosenwasser DanielRosenwasser commented Jan 28, 2022

A use case for VS Code would be: I'm browsing a normal on-disk repo on desktop VS Code and then try opening a JS file that's contained inside of an archive, such as a zip.

I want to dig into that more to better clarify how to model mixed virtual/non-virtual files in the same project.

Let's say you have a project on disk with /proj/foo.ts and /proj/bar.ts. A client editor says "Hey, /proj/bar.ts exists in a virtual filesystem."

If you allow these files to mix, then my assumption is that the virtual file system has to take precedence for all LS operations. But I don't know what TSServer would do with the old SourceFile of /proj/bar.ts. I also don't know if there's a case where some projects see the on-disk /proj/bar.ts and others see the in-memory /proj/bar.ts - I think

To continue, let's say the client editor says "/proj/bar.ts no longer exists in the virtual filesystem." I would assume that TypeScript still has to re-resolve /proj/bar.ts on disk in some way for at least one of the projects that originally contained it, right?

These might be weird edge-cases, but I want us to have a good mental model and some well-understood behavior here.


The main problem we ran into is that TS expects file reads to be sync, and there's not a good way to implement that since we'd have to communicate back and forth with the VS Code process

Yeah, that is weird - but this current idea might be less chatty than something like that. This isn't something we have to do for v1 of course.

@DanielRosenwasser
Copy link
Member

@DanielRosenwasser DanielRosenwasser commented Feb 2, 2022

One thing that came up in our sync today was whether we had to proactively defend against extremely large files, or too many files that are irrelevant to TS. Our thinking is that we should stick to something simple at first, but that we have room to grow and improve the experience here. One idea was to have editors send over truncated files, or omit their contents entirely, and mark those files as "proactively omitted". The server could then signal "hey, we actually did need these files if they're still around", and an editor could choose to provide them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants