Skip to content

Clean databases folder on startup#675

Merged
alexet merged 1 commit into
github:mainfrom
aeisenberg:aeisenberg/clean-orphan-dbs
Nov 16, 2020
Merged

Clean databases folder on startup#675
alexet merged 1 commit into
github:mainfrom
aeisenberg:aeisenberg/clean-orphan-dbs

Conversation

@aeisenberg
Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg commented Nov 6, 2020

Cleans orphan databases on startup. This commit also bumps the fs-extra
dependency to get readdir with dirent objects.

Adds the asyncFilter to filter arrays asynchronously.

Implemented as a command so a saavy user could assign a keyboard shortcut to it, but it is not accessible through the command palette.

Fixes #674.

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • @github/docs-content-dsp has been cc'd in all issues for UI or other user-facing changes made by this pull request.

@aeisenberg aeisenberg force-pushed the aeisenberg/clean-orphan-dbs branch 5 times, most recently from 5eb12ef to 4aa11e4 Compare November 7, 2020 17:54
@aeisenberg
Copy link
Copy Markdown
Contributor Author

Ping! @github/docs-content-dsp This is not entirely a user facing feature and perhaps mentioning in the changelog is good enough. But some users may want to be aware of this.

From the changelog:

Whenever the extension restarts, orphaned databases will be cleaned up. These are databases whose files are located inside of the extension's storage area, but are not imported into the workspace.

@hubot
Copy link
Copy Markdown

hubot commented Nov 7, 2020

:octocat:📚 Thanks for the docs ping! 🛎️ This was added to our docs first-responder project board. A team member will be along shortly to review this for docs impact, but you can also open a docs issue to request docs updates.

@aeisenberg aeisenberg force-pushed the aeisenberg/clean-orphan-dbs branch from 4aa11e4 to 4a3c2b4 Compare November 7, 2020 18:04
@shati-patel
Copy link
Copy Markdown
Contributor

Ping! @github/docs-content-dsp This is not entirely a user facing feature and perhaps mentioning in the changelog is good enough. But some users may want to be aware of this.

From the changelog:

Whenever the extension restarts, orphaned databases will be cleaned up. These are databases whose files are located inside of the extension's storage area, but are not imported into the workspace.

Thanks for letting us know 👍🏽 What does "cleaning up" the database actually mean in this case? As long as it doesn't break the database or do anything unexpected, we're probably fine with just a changelog entry 😊

@aeisenberg
Copy link
Copy Markdown
Contributor Author

Cleaning up means deleting unused databases.

In this case, the user has first imported a database as a zip file (or from a url or LGTM) and it was placed inside of the extension's storage area (a place that is not user facing). Then the user has removed this database from the extension. Most of the time, this means that the database is deleted from the file system as well, but sometimes (typically because windows has not released the file system lock), the database cannot immediately be removed from disk. Thus the database is orphaned. It exists, but is no longer used anywhere.

This change will ensure that these orphaned databases are removed eventually.

Note that this change does not affect databases added as a filesystem folder. We assume these databases are user controlled even after being removed from the extension.

@shati-patel
Copy link
Copy Markdown
Contributor

Cleaning up means deleting unused databases.

In this case, the user has first imported a database as a zip file (or from a url or LGTM) and it was placed inside of the extension's storage area (a place that is not user facing). Then the user has removed this database from the extension. Most of the time, this means that the database is deleted from the file system as well, but sometimes (typically because windows has not released the file system lock), the database cannot immediately be removed from disk. Thus the database is orphaned. It exists, but is no longer used anywhere.

This change will ensure that these orphaned databases are removed eventually.

Note that this change does not affect databases added as a filesystem folder. We assume these databases are user controlled even after being removed from the extension.

Great, thanks for clarifying! That all sounds sensible and pretty harmless. I don't think we need to document it.

@@ -0,0 +1,22 @@
import { fail } from 'assert';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in filename: pures -> pure

Comment on lines +721 to +728
const dbRegeEx = /^db-(javascript|go|cpp|java|python|csharp|ruby)$/;
function isLikelyDbFolder(dbPath: string) {
return path.basename(dbPath).match(dbRegeEx);
}

async function isDatabaseDirectory(dir: string) {
return (await fs.readdir(dir)).some(isLikelyDbFolder);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very fond of having to maintain a list of language slugs.
Elsewhere, I expect we are checking for the existence of codeql-database.yml (or .dbinfo as a fallback) to determine whether a directory is a CodeQL database. Can we continue to use that here instead of relying on the dataset folder name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the hard-coded languages, but I'm not quite sure what to do about them. Maybe there is some kind of way we can discover this by introspecting which standard libraries are installed, but that seems complex for now.

I'll be more precise and check for one of those two files.

}
})
);
showAndLogErrorMessage(`Failed to delete orphaned databases:\n ${failures.join(' \n')}'. Must delete manually.`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why leading space before the newline? And since this is user-facing, perhaps give them an action, e.g. 'To delete unused databases, please remove them manually from the workspace storage folder.'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading space is for indentation, to indent paths we could not remove. Though, the space should come after the newline.

Also, the failures should contain the full path to the database. Maybe I will use only the basename and include the storage folder elsewhere.

.filter(dirent => dirent.isDirectory())
// get the full path
.map(dirent => path.join(this.storagePath, dirent.name))
// filter databases still in workspace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// filter databases still in workspace
// filter out databases still in workspace

// filter databases still in workspace
.filter(dbDir => {
const dbUri = Uri.file(dbDir);
return this.databaseManager.databaseItems.every(item => item.databaseUri.fsPath !== dbUri.fsPath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to do this search for every identified directory in storage.
I suggest creating a Set up front containing all the fsPath values from this.databaseManager.databaseItems, and then this filter just becomes a set lookup.
My assumption here is that when this function is awaited, there is no way for the user to add databases to the workspace after the set is constructed but before cleanup completes (otherwise we have a race condition).

};

handleRemoveOrphanedDatabases = async (): Promise<void> => {
logger.log('Removing orphaned databases.');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.log('Removing orphaned databases.');
logger.log('Removing orphaned databases from workspace storage.');

}
})
);
showAndLogErrorMessage(`Failed to delete orphaned databases:\n ${failures.join(' \n')}'. Must delete manually.`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be called when failures is non-empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment thread extensions/ql-vscode/src/databases-ui.ts
Comment thread extensions/ql-vscode/src/extension.ts
Comment thread extensions/ql-vscode/package.json Outdated
},
{
"command": "codeQLDatabases.removeOrphanedDatabases",
"title": "Remove databases no longer imported into VS Code"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up unused databases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a user facing command. But, sure.

@aeisenberg aeisenberg added the Complexity: Low A good task for newcomers to learn, or experienced team members to complete quickly. label Nov 9, 2020
@aeisenberg aeisenberg force-pushed the aeisenberg/clean-orphan-dbs branch 3 times, most recently from 6f410d1 to 300fb21 Compare November 10, 2020 20:56
Cleans orphan databases on startup. This commit also bumps the fs-extra
dependency to get readdir with dirent objects.

Adds the `asyncFilter` to filter arrays asynchronously.
@aeisenberg aeisenberg force-pushed the aeisenberg/clean-orphan-dbs branch from 300fb21 to 4b11e5d Compare November 10, 2020 22:39
@alexet alexet merged commit e0cd041 into github:main Nov 16, 2020
@aeisenberg aeisenberg deleted the aeisenberg/clean-orphan-dbs branch November 24, 2020 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Complexity: Low A good task for newcomers to learn, or experienced team members to complete quickly.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Periodically and on demand clean up old databases in the extension

5 participants