UniProtKB/Swiss-Prot protein knowledgebase release 2023_01 statistics
1. INTRODUCTION
Release 2023_01 of 22-Feb-2023 of UniProtKB/Swiss-Prot contains 569213 sequence
entries, curated from 291046 unique references and comprising 205728242 amino acids.
479 sequences have been added since release 2022_05, the sequence data of
99 existing entries has been updated and the annotations of
544898 entries have been revised.
Number of fragments: 9289
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 40914
Protein existence (PE): entries %
1: Evidence at protein level 111580 19.6%
2: Evidence at transcript level 55959 9.8%
3: Inferred from homology 386735 67.9%
4: Predicted 13102 2.3%
5: Uncertain 1837 0.3%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 14403
The first twenty species represent 122631 sequences: 21.5 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5866
2x: 2086
3x: 1115
4x: 770
5x: 524
6x: 438
7x: 327
8x: 273
9x: 239
10x: 151
11- 20x: 829
21- 50x: 501
51-100x: 227
>100x: 1057
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20422 Homo sapiens (Human)
2 17141 Mus musculus (Mouse)
3 16299 Arabidopsis thaliana (Mouse-ear cress)
4 8177 Rattus norvegicus (Rat)
5 6727 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)
6 6035 Bos taurus (Bovine)
7 5121 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)
8 4530 Escherichia coli (strain K12)
9 4429 Caenorhabditis elegans
10 4191 Bacillus subtilis (strain 168)
11 4182 Oryza sativa subsp. japonica (Rice)
12 4159 Dictyostelium discoideum (Social amoeba)
13 3708 Drosophila melanogaster (Fruit fly)
14 3485 Xenopus laevis (African clawed frog)
15 3267 Danio rerio (Zebrafish) (Brachydanio rerio)
16 2304 Gallus gallus (Chicken)
17 2291 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
18 2218 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
19 2046 Escherichia coli O157:H7
20 1899 Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)
21 1818 Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)
22 1787 Methanocaldococcus jannaschii
23 1709 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
24 1704 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
25 1702 Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)
26 1695 Shigella flexneri
27 1458 Sus scrofa (Pig)
28 1441 Pseudomonas aeruginosa
29 1347 Salmonella typhi
30 1244 Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)
31 1176 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
32 1108 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
33 1087 Synechocystis sp. (strain PCC 6803 / Kazusa)
34 1036 Archaeoglobus fulgidus
35 1027 Yersinia pestis
36 1004 Emericella nidulans
37 992 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)
38 941 Staphylococcus aureus (strain Mu50 / ATCC 700699)
39 930 Salmonella paratyphi A (strain ATCC 9150 / SARB42)
40 929 Staphylococcus aureus (strain N315)
41 928 Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)
42 924 Kluyveromyces lactis (Yeast) (Candida sphaerica)
43 909 Acanthamoeba polyphaga mimivirus (APMV)
44 905 Staphylococcus aureus (strain COL)
45 897 Oryctolagus cuniculus (Rabbit)
46 896 Staphylococcus aureus (strain MW2)
47 895 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100)
48 894 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
49 890 Staphylococcus aureus (strain MSSA476)
50 888 Staphylococcus aureus (strain MRSA252)
51 887 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)
52 883 Neurospora crassa
53 882 Candida glabrata (Yeast) (Torulopsis glabrata)
54 882 Salmonella choleraesuis (strain SC-B67)
55 879 Shigella sonnei (strain Ss046)
56 867 Oryza sativa subsp. indica (Rice)
57 863 Yersinia pseudotuberculosis serotype I (strain IP32953)
58 847 Escherichia coli O9:H4 (strain HS)
59 847 Canis lupus familiaris (Dog) (Canis familiaris)
60 842 Zea mays (Maize)
61 838 Escherichia coli O139:H28 (strain E24377A / ETEC)
62 829 Shigella boydii serotype 4 (strain Sb227)
63 825 Escherichia coli (strain UTI89 / UPEC)
64 822 Escherichia coli
65 822 Shigella dysenteriae serotype 1 (strain Sd197)
66 812 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)
67 804 Staphylococcus aureus (strain NCTC 8325 / PS 47)
68 803 Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672)
69 795 Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)
70 791 Escherichia coli (strain SMS-3-5 / SECEC)
71 787 Aquifex aeolicus (strain VF5)
72 779 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
73 771 Escherichia coli (strain K12 / DH10B)
74 770 Pasteurella multocida (strain Pm70)
75 766 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
76 765 Escherichia coli (strain K12 / MC4100 / BW2952)
77 762 Escherichia coli (strain 55989 / EAEC)
78 761 Escherichia coli O8 (strain IAI1)
79 760 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
80 760 Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)
81 760 Shigella flexneri serotype 5b (strain 8401)
82 758 Escherichia coli O45:K1 (strain S88 / ExPEC)
83 757 Bacillus anthracis
84 756 Escherichia coli (strain SE11)
85 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
86 749 Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01)
87 748 Escherichia coli O157:H7 (strain EC4115 / EHEC)
88 744 Halalkalibacterium halodurans
89 739 Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)
90 733 Vibrio vulnificus (strain CMCP6)
91 731 Escherichia coli O81 (strain ED1a)
92 729 Pseudomonas putida
93 722 Salmonella enteritidis PT4 (strain P125109)
94 718 Vibrio vulnificus (strain YJ016)
95 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
96 715 Yersinia pestis bv. Antiqua (strain Nepal516)
97 715 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
98 715 Enterobacter sp. (strain 638)
99 714 Salmonella paratyphi A (strain AKU_12601)
100 714 Escherichia coli O1:K1 / APEC
101 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
102 713 Salmonella agona (strain SL483)
103 713 Salmonella newport (strain SL254)
104 712 Salmonella schwarzengrund (strain CVM19633)
105 711 Yersinia pestis bv. Antiqua (strain Antiqua)
106 710 Salmonella heidelberg (strain SL476)
107 708 Escherichia coli
108 702 Salmonella dublin (strain CT_02021853)
109 699 Klebsiella pneumoniae (strain 342)
110 698 Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)
111 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
112 695 Escherichia fergusonii
113 692 Pan troglodytes (Chimpanzee)
114 686 Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1)
115 684 Salmonella gallinarum (strain 287/91 / NCTC 13346)
116 681 Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)
117 678 Staphylococcus aureus (strain USA300)
118 678 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
119 672 Serratia proteamaculans (strain 568)
120 669 Mycobacterium leprae (strain TN)
121 669 Bacillus cereus
122 667 Yersinia pestis (strain Pestoides F)
123 666 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)
124 664 Bradyrhizobium diazoefficiens
125 658 Sinorhizobium fredii (strain NBRC 101917 / NGR234)
126 654 Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens
127 654 Shewanella oneidensis (strain MR-1)
128 653 Debaryomyces hansenii
129 643 Staphylococcus aureus (strain bovine RF122 / ET3-1)
130 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
131 642 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
132 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
133 622 Treponema pallidum (strain Nichols)
134 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)
135 622 Methanothermobacter thermautotrophicus
136 621 Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)
137 616 Pseudomonas aeruginosa (strain UCBPP-PA14)
138 615 Xanthomonas campestris pv. campestris
139 614 Staphylococcus haemolyticus (strain JCSC1435)
140 613 Mesorhizobium japonicum (Mesorhizobium loti
141 612 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)
142 605 Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)
143 603 Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)
144 602 Photobacterium profundum (strain SS9)
145 602 Staphylococcus saprophyticus subsp. saprophyticus
146 601 Salmonella paratyphi C (strain RKS4594)
147 600 Yersinia pestis bv. Antiqua (strain Angola)
148 595 Bacillus cereus (strain ATCC 10987 / NRS 248)
149 591 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
150 587 Neisseria meningitidis serogroup B (strain MC58)
151 586 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155)
152 584 Rickettsia prowazekii (strain Madrid E)
153 582 Caenorhabditis briggsae
154 579 Brucella suis biovar 1 (strain 1330)
155 575 Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)
156 575 Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)
157 573 Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)
158 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS)
159 569 Bacillus thuringiensis subsp. konkukian (strain 97-27)
160 568 Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)
161 567 Pseudomonas syringae pv. syringae (strain B728a)
162 566 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
163 565 Bacillus licheniformis
164 562 Bacillus cereus (strain ZK / E33L)
165 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)
166 560 Thermotoga maritima
167 559 Clostridium acetobutylicum
168 557 Xanthomonas axonopodis pv. citri (strain 306)
169 555 Pseudomonas fluorescens (strain Pf0-1)
170 554 Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)
171 554 Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)
172 553 Oceanobacillus iheyensis
173 547 Pseudomonas savastanoi pv. phaseolicola (Pseudomonas syringae pv. phaseolicola
174 540 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)
175 539 Corynebacterium glutamicum
176 531 Erwinia tasmaniensis
177 529 Sodalis glossinidius (strain morsitans)
178 529 Listeria monocytogenes serotype 4b (strain F2365)
179 528 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50)
180 524 Staphylococcus aureus (strain Newman)
181 522 Xylella fastidiosa (strain 9a5c)
182 522 Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)
183 521 Deinococcus radiodurans
184 519 Chromobacterium violaceum
185 519 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)
186 516 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)
187 515 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
188 514 Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)
189 512 Pseudomonas aeruginosa (strain PA7)
190 512 Geobacillus kaustophilus (strain HTA426)
191 511 Streptomyces avermitilis
192 510 Haemophilus ducreyi (strain 35000HP / ATCC 700724)
193 508 Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)
194 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)
195 505 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)
196 504 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
197 504 Pseudomonas entomophila (strain L48)
198 503 Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)
199 503 Nicotiana tabacum (Common tobacco)
200 499 Haemophilus influenzae (strain 86-028NP)
201 499 Brucella abortus biovar 1 (strain 9-941)
202 497 Burkholderia pseudomallei (strain K96243)
203 496 Proteus mirabilis (strain HI4320)
204 496 Rickettsia conorii (strain ATCC VR-613 / Malish 7)
205 496 Alkalihalobacillus clausii (strain KSM-K16) (Bacillus clausii)
206 495 Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1)
207 494 Xanthomonas campestris pv. campestris (strain 8004)
208 494 Pyrococcus horikoshii
209 492 Methanosarcina mazei
210 492 Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805)
211 492 Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42)
212 491 Vibrio campbellii (strain ATCC BAA-1116)
213 491 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
214 491 Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1)
215 491 Brucella abortus (strain 2308)
216 489 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2)
217 487 Shewanella sp. (strain MR-7)
218 486 Mannheimia succiniciproducens (strain MBEL55E)
219 484 Shewanella sp. (strain MR-4)
220 484 Staphylococcus aureus (strain Mu3 / ATCC 700698)
221 484 Pseudomonas aeruginosa (strain LESB58)
222 483 Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37)
223 483 Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)
224 479 Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)
225 477 Pyrococcus abyssi (strain GE5 / Orsay)
226 475 Cupriavidus necator
227 475 Burkholderia lata
228 473 Campylobacter jejuni subsp. jejuni serotype O:2
229 472 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)
230 469 Cereibacter sphaeroides
231 469 Clostridium perfringens (strain 13 / Type A)
232 468 Pseudomonas putida (strain GB-1)
233 468 Enterococcus faecalis (strain ATCC 700802 / V583)
234 468 Shewanella sp. (strain ANA-3)
235 467 Shewanella frigidimarina (strain NCIMB 400)
236 467 Aeromonas hydrophila subsp. hydrophila
237 466 Xanthomonas campestris pv. vesicatoria (strain 85-10)
238 465 Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)
239 463 Burkholderia mallei (strain ATCC 23344)
240 461 Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator
241 460 Ovis aries (Sheep)
242 460 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)
243 457 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)
244 455 Staphylococcus aureus (strain JH1)
245 455 Shewanella baltica (strain OS185)
246 455 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
247 453 Pseudomonas putida (strain W619)
248 453 Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10)
249 453 Streptococcus mutans serotype c (strain ATCC 700610 / UA159)
250 452 Aeromonas salmonicida (strain A449)
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 19701 ( 3%)
Bacteria 335807 ( 59%)
Eukaryota 196401 ( 35%)
Viruses 17304 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20423 ( 10%) ( 4%)
Other Mammalia 47248 ( 24%) ( 8%)
Other Vertebrata 18865 ( 10%) ( 3%)
Viridiplantae 41538 ( 21%) ( 7%)
Fungi 36325 ( 18%) ( 6%)
Insecta 9666 ( 5%) ( 2%)
Nematoda 5342 ( 3%) ( 1%)
Other 16994 ( 9%) ( 3%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 9932 1001-1100 4087
51- 100 43458 1101-1200 2875
101- 150 59740 1201-1300 2196
151- 200 59478 1301-1400 2064
201- 250 58356 1401-1500 1665
251- 300 52291 1501-1600 823
301- 350 52758 1601-1700 639
351- 400 45798 1701-1800 585
401- 450 37617 1801-1900 502
451- 500 30498 1901-2000 395
501- 550 22231 2001-2100 271
551- 600 15787 2101-2200 385
601- 650 13126 2201-2300 339
651- 700 9368 2301-2400 234
701- 750 7843 2401-2500 192
751- 800 5680 >2500 1452
801- 850 4876
851- 900 5285
901- 950 4105
951-1000 2993
The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3088
4.1 Table of the frequency of journal citations
Journals cited 1x: 988
2x: 429
3x: 209
4x: 136
5x: 131
6x: 83
7x: 68
8x: 75
9x: 47
10x: 36
11- 20x: 240
21- 50x: 256
51-100x: 135
>100x: 255
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 26704 Journal of Biological Chemistry
2 12430 Proceedings of the National Academy of Sciences of the U.S.A.
3 7102 Journal of Bacteriology
4 5967 Biochemical and Biophysical Research Communications
5 5761 Biochemistry
6 5258 Nucleic Acids Research
7 5041 FEBS Letters
8 4987 Nature
9 4906 The EMBO Journal
10 4881 Gene
11 4539 Journal of Molecular Biology
12 4524 Molecular and Cellular Biology
13 3952 Biochimica et Biophysica Acta
14 3794 Cell
15 3548 Journal of Virology
16 3496 European Journal of Biochemistry
17 3297 Science
18 3106 Biochemical Journal
19 2796 Molecular Microbiology
20 2793 Plant Physiology
21 2542 Genomics
22 2494 PLoS ONE
23 2398 The American Journal of Human Genetics
24 2321 Journal of Cell Biology
25 2179 The Plant Cell
26 2026 The Plant Journal
27 1965 Human Molecular Genetics
28 1927 Genes and Development
29 1917 Plant Molecular Biology
30 1886 Virology
31 1834 Nature Genetics
32 1789 Development
33 1787 Molecular Biology of the Cell
34 1732 Molecular Cell
35 1661 Journal of Immunology
36 1636 Human Mutation
37 1565 Oncogene
38 1446 Structure
39 1424 Molecular and General Genetics
40 1401 Journal of Biochemistry
41 1398 Genetics
42 1357 Journal of Cell Science
43 1251 Blood
44 1240 Infection and Immunity
45 1183 Journal of General Virology
46 1162 Developmental Biology
47 1155 Microbiology
48 1136 Archives of Biochemistry and Biophysics
49 1124 Current Biology
50 1053 Nature Communications
51 1010 Applied and Environmental Microbiology
52 1004 Journal of Neuroscience
53 983 Acta Crystallographica, Section D
54 921 Cancer Research
55 896 FEMS Microbiology Letters
56 879 Toxicon
57 860 PLoS Genetics
58 844 Yeast
59 842 Protein Science
60 837 Journal of Clinical Investigation
61 830 American Journal of Physiology
62 810 Neuron
63 792 Scientific Reports
64 759 Plant and Cell Physiology
65 742 Human Genetics
66 739 The Journal of Experimental Medicine
67 687 Journal of Medical Genetics
68 682 Proteins
69 667 Mechanisms of Development
70 655 The FEBS Journal
71 648 Nature Structural Biology
72 629 Nature Structural and Molecular Biology
73 624 Nature Cell Biology
74 617 Bioscience, Biotechnology, and Biochemistry
75 597 PLoS Pathogens
76 588 Current Genetics
77 573 Developmental Cell
78 566 Journal of Neurochemistry
79 550 Molecular Endocrinology
80 544 The Journal of Clinical Endocrinology and Metabolism
81 537 Endocrinology
82 536 Antimicrobial Agents and Chemotherapy
83 506 Molecular and Biochemical Parasitology
84 494 Mammalian Genome
85 493 Journal of the American Chemical Society
86 488 Experimental Cell Research
87 472 Eukaryotic Cell
88 467 Peptides
89 457 Journal of Experimental Botany
90 455 Planta
91 451 RNA
92 433 Immunogenetics
93 431 The FASEB Journal
94 430 EMBO Reports
95 428 American Journal of Medical Genetics. Part A
96 425 Molecular Pharmacology
97 417 Acta Crystallographica, Section F
98 417 Molecular Biology and Evolution
99 410 Cell Reports
100 407 Journal of Molecular Evolution
101 407 European Journal of Human Genetics
102 403 Immunity
103 397 Molecular Plant-Microbe Interactions
104 395 Journal of Investigative Dermatology
105 393 DNA and Cell Biology
106 388 Neurology
107 380 DNA Sequence
108 374 Biochimie
109 373 Clinical Genetics
110 371 Biology of Reproduction
111 368 Comparative Biochemistry and Physiology
112 357 Virus Research
113 355 Genes to Cells
114 342 Brain Research. Molecular Brain Research
115 340
116 339 Journal of Lipid Research
117 333 Developmental Dynamics
118 333 The New England Journal of Medicine
119 327 Annals of Neurology
120 327 Nature Immunology
121 317 BMC Genomics
122 317 PLoS Biology
123 313 Applied Microbiology and Biotechnology
124 308 European Journal of Immunology
125 306 Genome Research
126 304 Investigative Ophthalmology and Visual Science
127 301 Journal of Medicinal Chemistry
128 299 Biological Chemistry Hoppe-Seyler
129 293 Journal of Human Genetics
130 281 Cytogenetics and Cell Genetics
131 276 Journal of General Microbiology
132 275 Glycobiology
133 271 Archives of Microbiology
134 256 Traffic
135 255 Nature Chemical Biology
136 252 Phytochemistry
137 251 Molecular Immunology
138 250 Molecular Genetics and Metabolism
139 248 Journal of Cellular Biochemistry
140 248 Brain
141 247 Nature Medicine
142 245 Protein Expression and Purification
143 242 Fungal Genetics and Biology
144 240 Cell Cycle
145 234 DNA Research
146 231 Circulation Research
147 229 Diabetes
148 227 Archives of Virology
149 221 Cell Research
150 218 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 1290880 2.27
Journal 1117364 471204 1.96 1
Submitted to EMBL/GenBank/DDBJ 162144 146329 0.28 2
Submitted to other databases 7712 7058 0.01 3
Book citation 1861 1838 <0.01 4
Plant Gene Register 612 599 <0.01 5
Unpublished observations 510 506 <0.01 6
Thesis 457 454 <0.01 7
Patent 214 207 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 456645
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2710313 4.76
ACTIVITY REGULATION 17500 17386 0.03 17
ALLERGEN 944 944 <0.01 26
ALTERNATIVE PRODUCTS 25785 25785 0.05 13
BIOPHYSICOCHEMICAL PROPERTIES 11010 10967 0.02 20
BIOTECHNOLOGY 1783 1730 <0.01 24
CATALYTIC ACTIVITY 332534 252136 0.58 4
CAUTION 14281 13975 0.03 18
COFACTOR 132209 120111 0.23 7
DEVELOPMENTAL STAGE 13975 13902 0.02 19
DISEASE 8140 5464 0.01 21
DISRUPTION PHENOTYPE 19655 19630 0.03 16
DOMAIN 57363 48948 0.10 9
FUNCTION 486488 462639 0.85 2
INDUCTION 25019 24942 0.04 14
INTERACTION 23981 23981 0.04 15
MASS SPECTROMETRY 7456 5750 0.01 22
MISCELLANEOUS 45339 39830 0.08 11
PATHWAY 143522 129570 0.25 6
PHARMACEUTICAL 165 158 <0.01 29
POLYMORPHISM 1452 1343 <0.01 25
PTM 63212 45240 0.11 8
RNA EDITING 631 631 <0.01 28
SEQUENCE CAUTION 45077 45007 0.08 12
SIMILARITY 517720 513449 0.91 1
SUBCELLULAR LOCATION 362803 354462 0.64 3
SUBUNIT 294539 289407 0.52 5
TISSUE SPECIFICITY 50334 49871 0.09 10
TOXIC DOSE 847 677 <0.01 27
WEB RESOURCE 6549 5544 0.01 23
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 5236848 9.20
ACT_SITE 174890 104629 0.31 9
BINDING 1178760 215224 2.07 1
CARBOHYD 122879 31289 0.22 14
CHAIN 577507 561640 1.01 2
COILED 22407 15518 0.04 25
COMPBIAS 173910 73855 0.31 10
CONFLICT 138655 48339 0.24 12
CROSSLNK 24619 8864 0.04 24
DISULFID 133405 35543 0.23 13
DNA_BIND 12132 10857 0.02 31
DOMAIN 212859 130508 0.37 8
HELIX 319527 28072 0.56 5
INIT_MET 17506 17457 0.03 26
INTRAMEM 3023 1387 0.01 34
LIPID 13703 8790 0.02 28
MOD_RES 260509 74269 0.46 7
MOTIF 47105 30736 0.08 21
MUTAGEN 91200 19068 0.16 17
NON_CONS 2548 826 <0.01 35
NON_STD 358 283 <0.01 36
NON_TER 12622 9692 0.02 29
PEPTIDE 12464 8646 0.02 30
PROPEP 15128 12914 0.03 27
REGION 317527 149307 0.56 6
REPEAT 108801 15130 0.19 15
SIGNAL 43992 43991 0.08 22
SITE 64551 35023 0.11 19
STRAND 326700 26466 0.57 4
TOPO_DOM 148598 30219 0.26 11
TRANSIT 9490 9370 0.02 32
TRANSMEM 380067 79656 0.67 3
TURN 77302 22891 0.14 18
UNSURE 5741 891 0.01 33
VAR_SEQ 52937 22548 0.09 20
VARIANT 102839 17424 0.18 16
ZN_FING 30587 13065 0.05 23
Total number of feature keys: 36
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 20577518 36.15
ABCD 3011 3011 0.01 121 Protocols and materials databases
AGR 60696 60026 0.11 43 Organism-specific databases
Allergome 2034 1308 <0.01 129 Protein family/group databases
AlphaFoldDB 545443 545443 0.96 9 3D structure databases
Antibodypedia 32200 32091 0.06 59 Protocols and materials databases
ArachnoServer 1164 1154 <0.01 139 Organism-specific databases
Araport 16317 16221 0.03 90 Organism-specific databases
Bgee 61178 61178 0.11 41 Gene expression databases
BindingDB 6413 6413 0.01 107 Chemistry databases
BioCyc 58694 54700 0.10 45 Enzyme and pathway databases
BioGRID 61008 59122 0.11 42 Protein-protein interaction databases
BioGRID-ORCS 44869 44286 0.08 54 Miscellaneous databases
BioMuta 20309 20283 0.04 74 Genetic variation databases
BMRB 6899 6899 0.01 104 3D structure databases
BRENDA 20244 18446 0.04 76 Enzyme and pathway databases
CarbonylDB 1159 1159 <0.01 140 PTM databases
CAZy 9596 8643 0.02 98 Protein family/group databases
CCDS 49372 34555 0.09 51 Sequence databases
CDD 371049 294774 0.65 16 Family and domain databases
CGD 2090 2073 <0.01 128 Organism-specific databases
ChEMBL 8820 8623 0.02 99 Chemistry databases
ChiTaRS 29712 29668 0.05 63 Miscellaneous databases
CLAE 359 356 <0.01 155 Protein family/group databases
CollecTF 137 137 <0.01 162 Gene expression databases
ComplexPortal 13247 7251 0.02 96 Protein-protein interaction databases
COMPLUYEAST-2DPAGE 97 97 <0.01 164 2D gel databases
ConoServer 967 879 <0.01 142 Organism-specific databases
CORUM 5811 5811 0.01 108 Protein-protein interaction databases
CPTAC 2525 1632 <0.01 124 Proteomic databases
CPTC 374 374 <0.01 153 Protocols and materials databases
CTD 76045 75142 0.13 39 Organism-specific databases
DEPOD 254 254 <0.01 160 PTM databases
dictyBase 4224 4110 0.01 115 Organism-specific databases
DIP 17527 17487 0.03 86 Protein-protein interaction databases
DisGeNET 17015 16796 0.03 88 Organism-specific databases
DisProt 1750 1742 <0.01 131 Family and domain databases
DMDM 16173 16173 0.03 91 Genetic variation databases
DNASU 48254 48176 0.08 52 Protocols and materials databases
DOSAC-COBS-2DPAGE 145 145 <0.01 161 2D gel databases
DrugBank 30130 4742 0.05 62 Chemistry databases
DrugCentral 2564 2564 <0.01 123 Chemistry databases
EchoBASE 4158 4158 0.01 116 Organism-specific databases
eggNOG 338503 332694 0.59 18 Phylogenomic databases
ELM 1813 1813 <0.01 130 Protein-protein interaction databases
EMBL 1002102 556568 1.76 3 Sequence databases
Ensembl 98121 48400 0.17 36 Genome annotation databases
EnsemblBacteria 309724 298242 0.54 21 Genome annotation databases
EnsemblFungi 22947 22519 0.04 68 Genome annotation databases
EnsemblMetazoa 18704 11357 0.03 82 Genome annotation databases
EnsemblPlants 30932 21987 0.05 60 Genome annotation databases
EnsemblProtists 5293 5042 0.01 111 Genome annotation databases
EPD 23236 23236 0.04 67 Proteomic databases
ESTHER 2974 2973 0.01 122 Protein family/group databases
euHCVdb 55 44 <0.01 166 Organism-specific databases
EvolutionaryTrace 16750 16750 0.03 89 Miscellaneous databases
ExpressionAtlas 52753 52753 0.09 49 Gene expression databases
FlyBase 4111 3996 0.01 117 Organism-specific databases
Gene3D 737249 457643 1.30 6 Family and domain databases
GeneCards 20341 20202 0.04 72 Organism-specific databases
GeneID 323033 286927 0.57 20 Genome annotation databases
GeneReviews 1556 1553 <0.01 132 Organism-specific databases
GeneTree 58906 58895 0.10 44 Phylogenomic databases
Genevisible 55270 55270 0.10 47 Gene expression databases
GeneWiki 10352 10269 0.02 97 Miscellaneous databases
GenomeRNAi 22243 22243 0.04 70 Miscellaneous databases
GlyConnect 2372 2215 <0.01 125 PTM databases
GlyCosmos 28903 28903 0.05 64 PTM databases
GlyGen 15684 15684 0.03 92 PTM databases
GO 3176069 545442 5.58 1 Ontologies
Gramene 30932 21987 0.05 61 Genome annotation databases
GuidetoPHARMACOLOGY 2137 2137 <0.01 127 Chemistry databases
HAMAP 330828 327896 0.58 19 Family and domain databases
HGNC 20365 20235 0.04 71 Organism-specific databases
HOGENOM 426187 426187 0.75 15 Phylogenomic databases
HPA 19324 19204 0.03 80 Organism-specific databases
IDEAL 986 986 <0.01 141 Family and domain databases
IMGT_GENE-DB 267 267 <0.01 159 Protein family/group databases
InParanoid 141829 141829 0.25 28 Phylogenomic databases
IntAct 56773 56773 0.10 46 Protein-protein interaction databases
InterPro 2387306 550043 4.19 2 Family and domain databases
iPTMnet 54133 54133 0.10 48 PTM databases
jPOST 26405 26405 0.05 65 Proteomic databases
KEGG 503343 478544 0.88 11 Genome annotation databases
LegioList 765 763 <0.01 146 Organism-specific databases
Leproma 672 669 <0.01 148 Organism-specific databases
MaizeGDB 528 524 <0.01 150 Organism-specific databases
MalaCards 5363 5358 0.01 109 Organism-specific databases
MANE-Select 18293 18049 0.03 84 Genome annotation databases
MassIVE 18718 18718 0.03 81 Proteomic databases
MaxQB 33703 33703 0.06 58 Proteomic databases
MEROPS 14174 13757 0.02 94 Protein family/group databases
MetOSite 3111 3111 0.01 120 PTM databases
MGI 17052 17011 0.03 87 Organism-specific databases
MIM 22942 15905 0.04 69 Organism-specific databases
MINT 23435 23435 0.04 66 Protein-protein interaction databases
MoonDB 348 348 <0.01 156 Protein family/group databases
MoonProt 281 281 <0.01 158 Protein family/group databases
neXtProt 20330 20330 0.04 73 Organism-specific databases
NIAGADS 69 69 <0.01 165 Organism-specific databases
OGP 373 373 <0.01 154 2D gel databases
OMA 430151 430151 0.76 14 Phylogenomic databases
OpenTargets 18386 18243 0.03 83 Organism-specific databases
Orphanet 8121 4348 0.01 101 Organism-specific databases
OrthoDB 274272 274272 0.48 23 Phylogenomic databases
PANTHER 1001095 500948 1.76 4 Family and domain databases
PathwayCommons 19457 19457 0.03 79 Enzyme and pathway databases
PATRIC 92820 92820 0.16 37 Genome annotation databases
PaxDb 126414 126414 0.22 31 Proteomic databases
PCDDB 127 127 <0.01 163 3D structure databases
PDB 256659 33056 0.45 24 3D structure databases
PDBsum 256659 33056 0.45 25 3D structure databases
PeptideAtlas 39579 39579 0.07 57 Proteomic databases
PeroxiBase 791 769 <0.01 145 Protein family/group databases
Pfam 820629 538122 1.44 5 Family and domain databases
PharmGKB 18034 18015 0.03 85 Organism-specific databases
Pharos 20231 20231 0.04 77 Miscellaneous databases
PHI-base 1525 1264 <0.01 133 Miscellaneous databases
PhosphoSitePlus 39622 39622 0.07 56 PTM databases
PhylomeDB 115376 115376 0.20 33 Phylogenomic databases
PIR 124911 114600 0.22 32 Sequence databases
PIRSF 110779 109612 0.19 34 Family and domain databases
PlantReactome 1278 750 <0.01 136 Enzyme and pathway databases
PomBase 5129 5125 0.01 112 Organism-specific databases
PRIDE 636 636 <0.01 149 Proteomic databases
PRINTS 150430 129197 0.26 27 Family and domain databases
PRO 98139 98139 0.17 35 Miscellaneous databases
ProMEX 484 484 <0.01 152 Proteomic databases
PROSITE 488418 309536 0.86 12 Family and domain databases
Proteomes 487848 461476 0.86 13 Miscellaneous databases
ProteomicsDB 72546 45270 0.13 40 Proteomic databases
PseudoCAP 1448 1439 <0.01 135 Organism-specific databases
Reactome 140640 37755 0.25 29 Enzyme and pathway databases
REBASE 796 399 <0.01 144 Protein family/group databases
RefSeq 622881 475767 1.09 8 Sequence databases
REPRODUCTION-2DPAGE 1260 1039 <0.01 137 2D gel databases
RGD 8104 8103 0.01 102 Organism-specific databases
RNAct 43057 43057 0.08 55 Miscellaneous databases
SABIO-RK 5328 5328 0.01 110 Enzyme and pathway databases
SASBDB 710 710 <0.01 147 3D structure databases
SFLD 20245 9031 0.04 75 Family and domain databases
SGD 6747 6742 0.01 106 Organism-specific databases
SignaLink 19961 19961 0.04 78 Enzyme and pathway databases
SIGNOR 7065 7065 0.01 103 Enzyme and pathway databases
SMART 204961 147794 0.36 26 Family and domain databases
SMR 508433 508433 0.89 10 3D structure databases
STRING 366062 366062 0.64 17 Protein-protein interaction databases
SUPFAM 646499 458240 1.14 7 Family and domain databases
SWISS-2DPAGE 1177 1177 <0.01 138 2D gel databases
SwissLipids 1478 1394 <0.01 134 Chemistry databases
SwissPalm 13324 13324 0.02 95 PTM databases
TAIR 15082 15025 0.03 93 Organism-specific databases
TCDB 8267 8196 0.01 100 Protein family/group databases
TIGRFAMs 295753 274529 0.52 22 Family and domain databases
TopDownProteomics 3234 2956 0.01 118 Proteomic databases
TreeFam 46040 46017 0.08 53 Phylogenomic databases
TubercuList 2310 2274 <0.01 126 Organism-specific databases
UCD-2DPAGE 496 496 <0.01 151 2D gel databases
UCSC 50762 46305 0.09 50 Genome annotation databases
UniLectin 312 312 <0.01 157 Protein family/group databases
UniPathway 139656 126046 0.25 30 Enzyme and pathway databases
VEuPathDB 79663 73836 0.14 38 Organism-specific databases
VGNC 4482 4468 0.01 114 Organism-specific databases
WBParaSite 53 48 <0.01 167 Genome annotation databases
World-2DPAGE 935 923 <0.01 143 2D gel databases
WormBase 6898 5033 0.01 105 Organism-specific databases
Xenbase 4619 4560 0.01 113 Organism-specific databases
ZFIN 3233 3233 0.01 119 Organism-specific databases
Total number of cross-referenced databases: 167
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.25 Gln (Q) 3.93 Leu (L) 9.65 Ser (S) 6.64
Arg (R) 5.53 Glu (E) 6.72 Lys (K) 5.80 Thr (T) 5.36
Asn (N) 4.06 Gly (G) 7.07 Met (M) 2.41 Trp (W) 1.10
Asp (D) 5.46 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92
Cys (C) 1.38 Ile (I) 5.91 Pro (P) 4.74 Val (V) 6.85
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4465 entries are encoded on a mitochondrion, and 3976 are encoded on a plasmid.
12199 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11634 on chloroplasts,
51 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 80918