Home  |  Contact



         UniProtKB/Swiss-Prot protein knowledgebase release 2023_01 statistics





1.  INTRODUCTION



Release 2023_01 of 22-Feb-2023 of UniProtKB/Swiss-Prot contains 569213 sequence

entries, curated from 291046 unique references and comprising 205728242 amino acids. 



479 sequences have been added since release 2022_05, the sequence data of

99 existing entries has been updated and the annotations of

544898 entries have been revised.



Number of fragments: 9289

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 40914





Protein existence (PE):           entries     %



1: Evidence at protein level       111580   19.6%

2: Evidence at transcript level     55959    9.8%

3: Inferred from homology          386735   67.9%

4: Predicted                        13102    2.3%

5: Uncertain                         1837    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 14403



   The first twenty species represent 122631 sequences:  21.5 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5866

                            2x: 2086

                            3x: 1115

                            4x:  770

                            5x:  524

                            6x:  438

                            7x:  327

                            8x:  273

                            9x:  239

                           10x:  151

                       11- 20x:  829

                       21- 50x:  501

                       51-100x:  227

                         >100x: 1057





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20422  Homo sapiens (Human)

       2      17141  Mus musculus (Mouse)

       3      16299  Arabidopsis thaliana (Mouse-ear cress)

       4       8177  Rattus norvegicus (Rat)

       5       6727  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6035  Bos taurus (Bovine)

       7       5121  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4530  Escherichia coli (strain K12)

       9       4429  Caenorhabditis elegans

      10       4191  Bacillus subtilis (strain 168)

      11       4182  Oryza sativa subsp. japonica (Rice)

      12       4159  Dictyostelium discoideum (Social amoeba)

      13       3708  Drosophila melanogaster (Fruit fly)

      14       3485  Xenopus laevis (African clawed frog)

      15       3267  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2304  Gallus gallus (Chicken)

      17       2291  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2046  Escherichia coli O157:H7

      20       1899  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1818  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1709  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1704  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1702  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1695  Shigella flexneri

      27       1458  Sus scrofa (Pig)

      28       1441  Pseudomonas aeruginosa 

      29       1347  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1176  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1108  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      33       1087  Synechocystis sp. (strain PCC 6803 / Kazusa)

      34       1036  Archaeoglobus fulgidus 

      35       1027  Yersinia pestis

      36       1004  Emericella nidulans  

      37        992  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      38        941  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      39        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      40        929  Staphylococcus aureus (strain N315)

      41        928  Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)  

      42        924  Kluyveromyces lactis (Yeast) (Candida sphaerica)

      43        909  Acanthamoeba polyphaga mimivirus (APMV)

      44        905  Staphylococcus aureus (strain COL)

      45        897  Oryctolagus cuniculus (Rabbit)

      46        896  Staphylococcus aureus (strain MW2)

      47        895  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 

      48        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      49        890  Staphylococcus aureus (strain MSSA476)

      50        888  Staphylococcus aureus (strain MRSA252)

      51        887  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      52        883  Neurospora crassa 

      53        882  Candida glabrata (Yeast) (Torulopsis glabrata)

      54        882  Salmonella choleraesuis (strain SC-B67)

      55        879  Shigella sonnei (strain Ss046)

      56        867  Oryza sativa subsp. indica (Rice)

      57        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      58        847  Escherichia coli O9:H4 (strain HS)

      59        847  Canis lupus familiaris (Dog) (Canis familiaris)

      60        842  Zea mays (Maize)

      61        838  Escherichia coli O139:H28 (strain E24377A / ETEC)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Escherichia coli 

      65        822  Shigella dysenteriae serotype 1 (strain Sd197)

      66        812  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        804  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      68        803  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      69        795  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        787  Aquifex aeolicus (strain VF5)

      72        779  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        770  Pasteurella multocida (strain Pm70)

      75        766  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      80        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      81        760  Shigella flexneri serotype 5b (strain 8401)

      82        758  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        757  Bacillus anthracis

      84        756  Escherichia coli (strain SE11)

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        749  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01)

      87        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      88        744  Halalkalibacterium halodurans  

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        733  Vibrio vulnificus (strain CMCP6)

      91        731  Escherichia coli O81 (strain ED1a)

      92        729  Pseudomonas putida 

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        718  Vibrio vulnificus (strain YJ016)

      95        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      96        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      97        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

      98        715  Enterobacter sp. (strain 638)

      99        714  Salmonella paratyphi A (strain AKU_12601)

     100        714  Escherichia coli O1:K1 / APEC

     101        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     102        713  Salmonella agona (strain SL483)

     103        713  Salmonella newport (strain SL254)

     104        712  Salmonella schwarzengrund (strain CVM19633)

     105        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     106        710  Salmonella heidelberg (strain SL476)

     107        708  Escherichia coli

     108        702  Salmonella dublin (strain CT_02021853)

     109        699  Klebsiella pneumoniae (strain 342)

     110        698  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     111        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     112        695  Escherichia fergusonii 

     113        692  Pan troglodytes (Chimpanzee)

     114        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129 / Subtype 1) 

     115        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     116        681  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        678  Staphylococcus aureus (strain USA300)

     118        678  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     119        672  Serratia proteamaculans (strain 568)

     120        669  Mycobacterium leprae (strain TN)

     121        669  Bacillus cereus 

     122        667  Yersinia pestis (strain Pestoides F)

     123        666  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     124        664  Bradyrhizobium diazoefficiens 

     125        658  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     126        654  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     127        654  Shewanella oneidensis (strain MR-1)

     128        653  Debaryomyces hansenii   

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        642  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        622  Treponema pallidum (strain Nichols)

     134        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     135        622  Methanothermobacter thermautotrophicus  

     136        621  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     137        616  Pseudomonas aeruginosa (strain UCBPP-PA14)

     138        615  Xanthomonas campestris pv. campestris 

     139        614  Staphylococcus haemolyticus (strain JCSC1435)

     140        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     141        612  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     142        605  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     143        603  Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)

     144        602  Photobacterium profundum (strain SS9)

     145        602  Staphylococcus saprophyticus subsp. saprophyticus 

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        587  Neisseria meningitidis serogroup B (strain MC58)

     151        586  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     152        584  Rickettsia prowazekii (strain Madrid E)

     153        582  Caenorhabditis briggsae

     154        579  Brucella suis biovar 1 (strain 1330)

     155        575  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     156        575  Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)

     157        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     158        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     159        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     160        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     161        567  Pseudomonas syringae pv. syringae (strain B728a)

     162        566  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     163        565  Bacillus licheniformis 

     164        562  Bacillus cereus (strain ZK / E33L)

     165        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     166        560  Thermotoga maritima 

     167        559  Clostridium acetobutylicum 

     168        557  Xanthomonas axonopodis pv. citri (strain 306)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     171        554  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     172        553  Oceanobacillus iheyensis 

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     175        539  Corynebacterium glutamicum 

     176        531  Erwinia tasmaniensis 

     177        529  Sodalis glossinidius (strain morsitans)

     178        529  Listeria monocytogenes serotype 4b (strain F2365)

     179        528  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     180        524  Staphylococcus aureus (strain Newman)

     181        522  Xylella fastidiosa (strain 9a5c)

     182        522  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     183        521  Deinococcus radiodurans 

     184        519  Chromobacterium violaceum 

     185        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     186        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     187        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     188        514  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     189        512  Pseudomonas aeruginosa (strain PA7)

     190        512  Geobacillus kaustophilus (strain HTA426)

     191        511  Streptomyces avermitilis 

     192        510  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     193        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     194        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     195        505  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     196        504  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     197        504  Pseudomonas entomophila (strain L48)

     198        503  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     199        503  Nicotiana tabacum (Common tobacco)

     200        499  Haemophilus influenzae (strain 86-028NP)

     201        499  Brucella abortus biovar 1 (strain 9-941)

     202        497  Burkholderia pseudomallei (strain K96243)

     203        496  Proteus mirabilis (strain HI4320)

     204        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     205        496  Alkalihalobacillus clausii (strain KSM-K16) (Bacillus clausii)

     206        495  Thermosynechococcus vestitus (strain NIES-2133 / IAM M-273 / BP-1)

     207        494  Xanthomonas campestris pv. campestris (strain 8004)

     208        494  Pyrococcus horikoshii 

     209        492  Methanosarcina mazei  

     210        492  Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805) 

     211        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / LMG 26770 / FZB42) 

     212        491  Vibrio campbellii (strain ATCC BAA-1116)

     213        491  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     214        491  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     215        491  Brucella abortus (strain 2308)

     216        489  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     217        487  Shewanella sp. (strain MR-7)

     218        486  Mannheimia succiniciproducens (strain MBEL55E)

     219        484  Shewanella sp. (strain MR-4)

     220        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     221        484  Pseudomonas aeruginosa (strain LESB58)

     222        483  Mycoplasma genitalium (strain ATCC 33530 / DSM 19775 / NCTC 10195 / G37) 

     223        483  Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1) 

     224        479  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     225        477  Pyrococcus abyssi (strain GE5 / Orsay)

     226        475  Cupriavidus necator  

     227        475  Burkholderia lata 

     228        473  Campylobacter jejuni subsp. jejuni serotype O:2 

     229        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     230        469  Cereibacter sphaeroides  

     231        469  Clostridium perfringens (strain 13 / Type A)

     232        468  Pseudomonas putida (strain GB-1)

     233        468  Enterococcus faecalis (strain ATCC 700802 / V583)

     234        468  Shewanella sp. (strain ANA-3)

     235        467  Shewanella frigidimarina (strain NCIMB 400)

     236        467  Aeromonas hydrophila subsp. hydrophila 

     237        466  Xanthomonas campestris pv. vesicatoria (strain 85-10)

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        461  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        460  Ovis aries (Sheep)

     242        460  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Staphylococcus aureus (strain JH1)

     245        455  Shewanella baltica (strain OS185)

     246        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     247        453  Pseudomonas putida (strain W619)

     248        453  Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10) 

     249        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     250        452  Aeromonas salmonicida (strain A449)





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19701 (  3%)

    Bacteria         335807 ( 59%)

    Eukaryota        196401 ( 35%)

    Viruses           17304 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20423 ( 10%)           (  4%)

     Other Mammalia         47248 ( 24%)           (  8%)

     Other Vertebrata       18865 ( 10%)           (  3%)

     Viridiplantae          41538 ( 21%)           (  7%)

     Fungi                  36325 ( 18%)           (  6%)

     Insecta                 9666 (  5%)           (  2%)

     Nematoda                5342 (  3%)           (  1%)

     Other                  16994 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9932             1001-1100     4087

                 51- 100   43458             1101-1200     2875

                101- 150   59740             1201-1300     2196

                151- 200   59478             1301-1400     2064

                201- 250   58356             1401-1500     1665

                251- 300   52291             1501-1600      823

                301- 350   52758             1601-1700      639

                351- 400   45798             1701-1800      585

                401- 450   37617             1801-1900      502

                451- 500   30498             1901-2000      395

                501- 550   22231             2001-2100      271

                551- 600   15787             2101-2200      385

                601- 650   13126             2201-2300      339

                651- 700    9368             2301-2400      234

                701- 750    7843             2401-2500      192

                751- 800    5680             >2500         1452

                801- 850    4876

                851- 900    5285

                901- 950    4105

                951-1000    2993



   





   The average sequence length in UniProtKB/Swiss-Prot is 361 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 3088





   4.1 Table of the frequency of journal citations



        Journals cited 1x:  988

                       2x:  429

                       3x:  209

                       4x:  136

                       5x:  131

                       6x:   83

                       7x:   68

                       8x:   75

                       9x:   47

                      10x:   36

                  11- 20x:  240

                  21- 50x:  256

                  51-100x:  135

                    >100x:  255





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        26704   Journal of Biological Chemistry

    2        12430   Proceedings of the National Academy of Sciences of the U.S.A.

    3         7102   Journal of Bacteriology

    4         5967   Biochemical and Biophysical Research Communications

    5         5761   Biochemistry

    6         5258   Nucleic Acids Research

    7         5041   FEBS Letters

    8         4987   Nature

    9         4906   The EMBO Journal

   10         4881   Gene

   11         4539   Journal of Molecular Biology

   12         4524   Molecular and Cellular Biology

   13         3952   Biochimica et Biophysica Acta

   14         3794   Cell

   15         3548   Journal of Virology

   16         3496   European Journal of Biochemistry

   17         3297   Science

   18         3106   Biochemical Journal

   19         2796   Molecular Microbiology

   20         2793   Plant Physiology

   21         2542   Genomics

   22         2494   PLoS ONE

   23         2398   The American Journal of Human Genetics

   24         2321   Journal of Cell Biology

   25         2179   The Plant Cell

   26         2026   The Plant Journal

   27         1965   Human Molecular Genetics

   28         1927   Genes and Development

   29         1917   Plant Molecular Biology

   30         1886   Virology

   31         1834   Nature Genetics

   32         1789   Development

   33         1787   Molecular Biology of the Cell

   34         1732   Molecular Cell

   35         1661   Journal of Immunology

   36         1636   Human Mutation

   37         1565   Oncogene

   38         1446   Structure

   39         1424   Molecular and General Genetics

   40         1401   Journal of Biochemistry

   41         1398   Genetics

   42         1357   Journal of Cell Science

   43         1251   Blood

   44         1240   Infection and Immunity

   45         1183   Journal of General Virology

   46         1162   Developmental Biology

   47         1155   Microbiology

   48         1136   Archives of Biochemistry and Biophysics

   49         1124   Current Biology

   50         1053   Nature Communications

   51         1010   Applied and Environmental Microbiology

   52         1004   Journal of Neuroscience

   53          983   Acta Crystallographica, Section D

   54          921   Cancer Research

   55          896   FEMS Microbiology Letters

   56          879   Toxicon

   57          860   PLoS Genetics

   58          844   Yeast

   59          842   Protein Science

   60          837   Journal of Clinical Investigation

   61          830   American Journal of Physiology

   62          810   Neuron

   63          792   Scientific Reports

   64          759   Plant and Cell Physiology

   65          742   Human Genetics

   66          739   The Journal of Experimental Medicine

   67          687   Journal of Medical Genetics

   68          682   Proteins

   69          667   Mechanisms of Development

   70          655   The FEBS Journal

   71          648   Nature Structural Biology

   72          629   Nature Structural and Molecular Biology

   73          624   Nature Cell Biology

   74          617   Bioscience, Biotechnology, and Biochemistry

   75          597   PLoS Pathogens

   76          588   Current Genetics

   77          573   Developmental Cell

   78          566   Journal of Neurochemistry

   79          550   Molecular Endocrinology

   80          544   The Journal of Clinical Endocrinology and Metabolism

   81          537   Endocrinology

   82          536   Antimicrobial Agents and Chemotherapy

   83          506   Molecular and Biochemical Parasitology

   84          494   Mammalian Genome

   85          493   Journal of the American Chemical Society

   86          488   Experimental Cell Research

   87          472   Eukaryotic Cell

   88          467   Peptides

   89          457   Journal of Experimental Botany

   90          455   Planta

   91          451   RNA

   92          433   Immunogenetics

   93          431   The FASEB Journal

   94          430   EMBO Reports

   95          428   American Journal of Medical Genetics. Part A

   96          425   Molecular Pharmacology

   97          417   Acta Crystallographica, Section F

   98          417   Molecular Biology and Evolution

   99          410   Cell Reports

  100          407   Journal of Molecular Evolution

  101          407   European Journal of Human Genetics

  102          403   Immunity

  103          397   Molecular Plant-Microbe Interactions

  104          395   Journal of Investigative Dermatology

  105          393   DNA and Cell Biology

  106          388   Neurology

  107          380   DNA Sequence

  108          374   Biochimie

  109          373   Clinical Genetics

  110          371   Biology of Reproduction

  111          368   Comparative Biochemistry and Physiology

  112          357   Virus Research

  113          355   Genes to Cells

  114          342   Brain Research. Molecular Brain Research

  115          340   

  116          339   Journal of Lipid Research

  117          333   Developmental Dynamics

  118          333   The New England Journal of Medicine

  119          327   Annals of Neurology

  120          327   Nature Immunology

  121          317   BMC Genomics

  122          317   PLoS Biology

  123          313   Applied Microbiology and Biotechnology

  124          308   European Journal of Immunology

  125          306   Genome Research

  126          304   Investigative Ophthalmology and Visual Science

  127          301   Journal of Medicinal Chemistry

  128          299   Biological Chemistry Hoppe-Seyler

  129          293   Journal of Human Genetics

  130          281   Cytogenetics and Cell Genetics

  131          276   Journal of General Microbiology

  132          275   Glycobiology

  133          271   Archives of Microbiology

  134          256   Traffic

  135          255   Nature Chemical Biology

  136          252   Phytochemistry

  137          251   Molecular Immunology

  138          250   Molecular Genetics and Metabolism

  139          248   Journal of Cellular Biochemistry

  140          248   Brain

  141          247   Nature Medicine

  142          245   Protein Expression and Purification

  143          242   Fungal Genetics and Biology

  144          240   Cell Cycle

  145          234   DNA Research

  146          231   Circulation Research

  147          229   Diabetes

  148          227   Archives of Virology

  149          221   Cell Research

  150          218   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1290880                 2.27                                         

   Journal                           1117364     471204      1.96       1                                 

   Submitted to EMBL/GenBank/DDBJ     162144     146329      0.28       2                                 

   Submitted to other databases         7712       7058      0.01       3                                 

   Book citation                        1861       1838     <0.01       4                                 

   Plant Gene Register                   612        599     <0.01       5                                 

   Unpublished observations              510        506     <0.01       6                                 

   Thesis                                457        454     <0.01       7                                 

   Patent                                214        207     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 456645



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2710313                 4.76                                         

   ACTIVITY REGULATION                 17500      17386      0.03      17                                 

   ALLERGEN                              944        944     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25785      25785      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES       11010      10967      0.02      20                                 

   BIOTECHNOLOGY                        1783       1730     <0.01      24                                 

   CATALYTIC ACTIVITY                 332534     252136      0.58       4                                 

   CAUTION                             14281      13975      0.03      18                                 

   COFACTOR                           132209     120111      0.23       7                                 

   DEVELOPMENTAL STAGE                 13975      13902      0.02      19                                 

   DISEASE                              8140       5464      0.01      21                                 

   DISRUPTION PHENOTYPE                19655      19630      0.03      16                                 

   DOMAIN                              57363      48948      0.10       9                                 

   FUNCTION                           486488     462639      0.85       2                                 

   INDUCTION                           25019      24942      0.04      14                                 

   INTERACTION                         23981      23981      0.04      15                                 

   MASS SPECTROMETRY                    7456       5750      0.01      22                                 

   MISCELLANEOUS                       45339      39830      0.08      11                                 

   PATHWAY                            143522     129570      0.25       6                                 

   PHARMACEUTICAL                        165        158     <0.01      29                                 

   POLYMORPHISM                         1452       1343     <0.01      25                                 

   PTM                                 63212      45240      0.11       8                                 

   RNA EDITING                           631        631     <0.01      28                                 

   SEQUENCE CAUTION                    45077      45007      0.08      12                                 

   SIMILARITY                         517720     513449      0.91       1                                 

   SUBCELLULAR LOCATION               362803     354462      0.64       3                                 

   SUBUNIT                            294539     289407      0.52       5                                 

   TISSUE SPECIFICITY                  50334      49871      0.09      10                                 

   TOXIC DOSE                            847        677     <0.01      27                                 

   WEB RESOURCE                         6549       5544      0.01      23                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        5236848                 9.20                                         

   ACT_SITE                           174890     104629      0.31       9                                 

   BINDING                           1178760     215224      2.07       1                                 

   CARBOHYD                           122879      31289      0.22      14                                 

   CHAIN                              577507     561640      1.01       2                                 

   COILED                              22407      15518      0.04      25                                 

   COMPBIAS                           173910      73855      0.31      10                                 

   CONFLICT                           138655      48339      0.24      12                                 

   CROSSLNK                            24619       8864      0.04      24                                 

   DISULFID                           133405      35543      0.23      13                                 

   DNA_BIND                            12132      10857      0.02      31                                 

   DOMAIN                             212859     130508      0.37       8                                 

   HELIX                              319527      28072      0.56       5                                 

   INIT_MET                            17506      17457      0.03      26                                 

   INTRAMEM                             3023       1387      0.01      34                                 

   LIPID                               13703       8790      0.02      28                                 

   MOD_RES                            260509      74269      0.46       7                                 

   MOTIF                               47105      30736      0.08      21                                 

   MUTAGEN                             91200      19068      0.16      17                                 

   NON_CONS                             2548        826     <0.01      35                                 

   NON_STD                               358        283     <0.01      36                                 

   NON_TER                             12622       9692      0.02      29                                 

   PEPTIDE                             12464       8646      0.02      30                                 

   PROPEP                              15128      12914      0.03      27                                 

   REGION                             317527     149307      0.56       6                                 

   REPEAT                             108801      15130      0.19      15                                 

   SIGNAL                              43992      43991      0.08      22                                 

   SITE                                64551      35023      0.11      19                                 

   STRAND                             326700      26466      0.57       4                                 

   TOPO_DOM                           148598      30219      0.26      11                                 

   TRANSIT                              9490       9370      0.02      32                                 

   TRANSMEM                           380067      79656      0.67       3                                 

   TURN                                77302      22891      0.14      18                                 

   UNSURE                               5741        891      0.01      33                                 

   VAR_SEQ                             52937      22548      0.09      20                                 

   VARIANT                            102839      17424      0.18      16                                 

   ZN_FING                             30587      13065      0.05      23                                 



Total number of feature keys: 36







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               20577518                36.15                                                           

   ABCD                                 3011       3011      0.01     121      Protocols and materials databases            

   AGR                                 60696      60026      0.11      43      Organism-specific databases                  

   Allergome                            2034       1308     <0.01     129      Protein family/group databases               

   AlphaFoldDB                        545443     545443      0.96       9      3D structure databases                       

   Antibodypedia                       32200      32091      0.06      59      Protocols and materials databases            

   ArachnoServer                        1164       1154     <0.01     139      Organism-specific databases                  

   Araport                             16317      16221      0.03      90      Organism-specific databases                  

   Bgee                                61178      61178      0.11      41      Gene expression databases                    

   BindingDB                            6413       6413      0.01     107      Chemistry databases                          

   BioCyc                              58694      54700      0.10      45      Enzyme and pathway databases                 

   BioGRID                             61008      59122      0.11      42      Protein-protein interaction databases        

   BioGRID-ORCS                        44869      44286      0.08      54      Miscellaneous databases                      

   BioMuta                             20309      20283      0.04      74      Genetic variation databases                  

   BMRB                                 6899       6899      0.01     104      3D structure databases                       

   BRENDA                              20244      18446      0.04      76      Enzyme and pathway databases                 

   CarbonylDB                           1159       1159     <0.01     140      PTM databases                                

   CAZy                                 9596       8643      0.02      98      Protein family/group databases               

   CCDS                                49372      34555      0.09      51      Sequence databases                           

   CDD                                371049     294774      0.65      16      Family and domain databases                  

   CGD                                  2090       2073     <0.01     128      Organism-specific databases                  

   ChEMBL                               8820       8623      0.02      99      Chemistry databases                          

   ChiTaRS                             29712      29668      0.05      63      Miscellaneous databases                      

   CLAE                                  359        356     <0.01     155      Protein family/group databases               

   CollecTF                              137        137     <0.01     162      Gene expression databases                    

   ComplexPortal                       13247       7251      0.02      96      Protein-protein interaction databases        

   COMPLUYEAST-2DPAGE                     97         97     <0.01     164      2D gel databases                             

   ConoServer                            967        879     <0.01     142      Organism-specific databases                  

   CORUM                                5811       5811      0.01     108      Protein-protein interaction databases        

   CPTAC                                2525       1632     <0.01     124      Proteomic databases                          

   CPTC                                  374        374     <0.01     153      Protocols and materials databases            

   CTD                                 76045      75142      0.13      39      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     160      PTM databases                                

   dictyBase                            4224       4110      0.01     115      Organism-specific databases                  

   DIP                                 17527      17487      0.03      86      Protein-protein interaction databases        

   DisGeNET                            17015      16796      0.03      88      Organism-specific databases                  

   DisProt                              1750       1742     <0.01     131      Family and domain databases                  

   DMDM                                16173      16173      0.03      91      Genetic variation databases                  

   DNASU                               48254      48176      0.08      52      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     161      2D gel databases                             

   DrugBank                            30130       4742      0.05      62      Chemistry databases                          

   DrugCentral                          2564       2564     <0.01     123      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     116      Organism-specific databases                  

   eggNOG                             338503     332694      0.59      18      Phylogenomic databases                       

   ELM                                  1813       1813     <0.01     130      Protein-protein interaction databases        

   EMBL                              1002102     556568      1.76       3      Sequence databases                           

   Ensembl                             98121      48400      0.17      36      Genome annotation databases                  

   EnsemblBacteria                    309724     298242      0.54      21      Genome annotation databases                  

   EnsemblFungi                        22947      22519      0.04      68      Genome annotation databases                  

   EnsemblMetazoa                      18704      11357      0.03      82      Genome annotation databases                  

   EnsemblPlants                       30932      21987      0.05      60      Genome annotation databases                  

   EnsemblProtists                      5293       5042      0.01     111      Genome annotation databases                  

   EPD                                 23236      23236      0.04      67      Proteomic databases                          

   ESTHER                               2974       2973      0.01     122      Protein family/group databases               

   euHCVdb                                55         44     <0.01     166      Organism-specific databases                  

   EvolutionaryTrace                   16750      16750      0.03      89      Miscellaneous databases                      

   ExpressionAtlas                     52753      52753      0.09      49      Gene expression databases                    

   FlyBase                              4111       3996      0.01     117      Organism-specific databases                  

   Gene3D                             737249     457643      1.30       6      Family and domain databases                  

   GeneCards                           20341      20202      0.04      72      Organism-specific databases                  

   GeneID                             323033     286927      0.57      20      Genome annotation databases                  

   GeneReviews                          1556       1553     <0.01     132      Organism-specific databases                  

   GeneTree                            58906      58895      0.10      44      Phylogenomic databases                       

   Genevisible                         55270      55270      0.10      47      Gene expression databases                    

   GeneWiki                            10352      10269      0.02      97      Miscellaneous databases                      

   GenomeRNAi                          22243      22243      0.04      70      Miscellaneous databases                      

   GlyConnect                           2372       2215     <0.01     125      PTM databases                                

   GlyCosmos                           28903      28903      0.05      64      PTM databases                                

   GlyGen                              15684      15684      0.03      92      PTM databases                                

   GO                                3176069     545442      5.58       1      Ontologies                                   

   Gramene                             30932      21987      0.05      61      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2137       2137     <0.01     127      Chemistry databases                          

   HAMAP                              330828     327896      0.58      19      Family and domain databases                  

   HGNC                                20365      20235      0.04      71      Organism-specific databases                  

   HOGENOM                            426187     426187      0.75      15      Phylogenomic databases                       

   HPA                                 19324      19204      0.03      80      Organism-specific databases                  

   IDEAL                                 986        986     <0.01     141      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     159      Protein family/group databases               

   InParanoid                         141829     141829      0.25      28      Phylogenomic databases                       

   IntAct                              56773      56773      0.10      46      Protein-protein interaction databases        

   InterPro                          2387306     550043      4.19       2      Family and domain databases                  

   iPTMnet                             54133      54133      0.10      48      PTM databases                                

   jPOST                               26405      26405      0.05      65      Proteomic databases                          

   KEGG                               503343     478544      0.88      11      Genome annotation databases                  

   LegioList                             765        763     <0.01     146      Organism-specific databases                  

   Leproma                               672        669     <0.01     148      Organism-specific databases                  

   MaizeGDB                              528        524     <0.01     150      Organism-specific databases                  

   MalaCards                            5363       5358      0.01     109      Organism-specific databases                  

   MANE-Select                         18293      18049      0.03      84      Genome annotation databases                  

   MassIVE                             18718      18718      0.03      81      Proteomic databases                          

   MaxQB                               33703      33703      0.06      58      Proteomic databases                          

   MEROPS                              14174      13757      0.02      94      Protein family/group databases               

   MetOSite                             3111       3111      0.01     120      PTM databases                                

   MGI                                 17052      17011      0.03      87      Organism-specific databases                  

   MIM                                 22942      15905      0.04      69      Organism-specific databases                  

   MINT                                23435      23435      0.04      66      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     156      Protein family/group databases               

   MoonProt                              281        281     <0.01     158      Protein family/group databases               

   neXtProt                            20330      20330      0.04      73      Organism-specific databases                  

   NIAGADS                                69         69     <0.01     165      Organism-specific databases                  

   OGP                                   373        373     <0.01     154      2D gel databases                             

   OMA                                430151     430151      0.76      14      Phylogenomic databases                       

   OpenTargets                         18386      18243      0.03      83      Organism-specific databases                  

   Orphanet                             8121       4348      0.01     101      Organism-specific databases                  

   OrthoDB                            274272     274272      0.48      23      Phylogenomic databases                       

   PANTHER                           1001095     500948      1.76       4      Family and domain databases                  

   PathwayCommons                      19457      19457      0.03      79      Enzyme and pathway databases                 

   PATRIC                              92820      92820      0.16      37      Genome annotation databases                  

   PaxDb                              126414     126414      0.22      31      Proteomic databases                          

   PCDDB                                 127        127     <0.01     163      3D structure databases                       

   PDB                                256659      33056      0.45      24      3D structure databases                       

   PDBsum                             256659      33056      0.45      25      3D structure databases                       

   PeptideAtlas                        39579      39579      0.07      57      Proteomic databases                          

   PeroxiBase                            791        769     <0.01     145      Protein family/group databases               

   Pfam                               820629     538122      1.44       5      Family and domain databases                  

   PharmGKB                            18034      18015      0.03      85      Organism-specific databases                  

   Pharos                              20231      20231      0.04      77      Miscellaneous databases                      

   PHI-base                             1525       1264     <0.01     133      Miscellaneous databases                      

   PhosphoSitePlus                     39622      39622      0.07      56      PTM databases                                

   PhylomeDB                          115376     115376      0.20      33      Phylogenomic databases                       

   PIR                                124911     114600      0.22      32      Sequence databases                           

   PIRSF                              110779     109612      0.19      34      Family and domain databases                  

   PlantReactome                        1278        750     <0.01     136      Enzyme and pathway databases                 

   PomBase                              5129       5125      0.01     112      Organism-specific databases                  

   PRIDE                                 636        636     <0.01     149      Proteomic databases                          

   PRINTS                             150430     129197      0.26      27      Family and domain databases                  

   PRO                                 98139      98139      0.17      35      Miscellaneous databases                      

   ProMEX                                484        484     <0.01     152      Proteomic databases                          

   PROSITE                            488418     309536      0.86      12      Family and domain databases                  

   Proteomes                          487848     461476      0.86      13      Miscellaneous databases                      

   ProteomicsDB                        72546      45270      0.13      40      Proteomic databases                          

   PseudoCAP                            1448       1439     <0.01     135      Organism-specific databases                  

   Reactome                           140640      37755      0.25      29      Enzyme and pathway databases                 

   REBASE                                796        399     <0.01     144      Protein family/group databases               

   RefSeq                             622881     475767      1.09       8      Sequence databases                           

   REPRODUCTION-2DPAGE                  1260       1039     <0.01     137      2D gel databases                             

   RGD                                  8104       8103      0.01     102      Organism-specific databases                  

   RNAct                               43057      43057      0.08      55      Miscellaneous databases                      

   SABIO-RK                             5328       5328      0.01     110      Enzyme and pathway databases                 

   SASBDB                                710        710     <0.01     147      3D structure databases                       

   SFLD                                20245       9031      0.04      75      Family and domain databases                  

   SGD                                  6747       6742      0.01     106      Organism-specific databases                  

   SignaLink                           19961      19961      0.04      78      Enzyme and pathway databases                 

   SIGNOR                               7065       7065      0.01     103      Enzyme and pathway databases                 

   SMART                              204961     147794      0.36      26      Family and domain databases                  

   SMR                                508433     508433      0.89      10      3D structure databases                       

   STRING                             366062     366062      0.64      17      Protein-protein interaction databases        

   SUPFAM                             646499     458240      1.14       7      Family and domain databases                  

   SWISS-2DPAGE                         1177       1177     <0.01     138      2D gel databases                             

   SwissLipids                          1478       1394     <0.01     134      Chemistry databases                          

   SwissPalm                           13324      13324      0.02      95      PTM databases                                

   TAIR                                15082      15025      0.03      93      Organism-specific databases                  

   TCDB                                 8267       8196      0.01     100      Protein family/group databases               

   TIGRFAMs                           295753     274529      0.52      22      Family and domain databases                  

   TopDownProteomics                    3234       2956      0.01     118      Proteomic databases                          

   TreeFam                             46040      46017      0.08      53      Phylogenomic databases                       

   TubercuList                          2310       2274     <0.01     126      Organism-specific databases                  

   UCD-2DPAGE                            496        496     <0.01     151      2D gel databases                             

   UCSC                                50762      46305      0.09      50      Genome annotation databases                  

   UniLectin                             312        312     <0.01     157      Protein family/group databases               

   UniPathway                         139656     126046      0.25      30      Enzyme and pathway databases                 

   VEuPathDB                           79663      73836      0.14      38      Organism-specific databases                  

   VGNC                                 4482       4468      0.01     114      Organism-specific databases                  

   WBParaSite                             53         48     <0.01     167      Genome annotation databases                  

   World-2DPAGE                          935        923     <0.01     143      2D gel databases                             

   WormBase                             6898       5033      0.01     105      Organism-specific databases                  

   Xenbase                              4619       4560      0.01     113      Organism-specific databases                  

   ZFIN                                 3233       3233      0.01     119      Organism-specific databases                  



Total number of cross-referenced databases: 167



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.64

   Arg (R) 5.53   Glu (E) 6.72   Lys (K) 5.80   Thr (T) 5.36

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.91   Pro (P) 4.74   Val (V) 6.85



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4465 entries are encoded on a mitochondrion, and 3976 are encoded on a plasmid.



12199 entries are encoded on a plastid, 

of which 21 are encoded on apicoplasts, 

11634 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 80918