
Modelling Continuous Percentile Rank Scores and Integrated Impact Indicators (I3) / Une modélisation des notations continues de classement par pourcentage et des indicateurs intégrés d'impact (I3)
Based on the wellknown discrete definitions we introduce a continuous framework for percentile rank scores and integrated impact indicators (I3). This is done by taking the integral of a scoring function multiplied by a distribution function. Examples are provided by considering several distribution functions and two scoring functions, where the distribution function can take any form and the scoring function is nondecreasing.
Sur la base des définitions distinctes bien connues, nous introduisons un cadre continu pour les notations de classement par pourcentage et les indicateurs intégrés d'impact (I3). Ceci est fait en pregnant l'intégrale d'une fonction de notation multipliée par une fonction de distribution. Des exemples sont fournis en tenant compte de plusieurs fonctions de distribution et de deux fonctions de notation, où la fonction de distribution peut prendre n'importe quelle forme et la fonction de notation est non décroissante.
percentile rank scores, integrated impact indicator, I3, percentile, continuous modelling
notations de classement par pourcentage, indicateur intégré d'impact, I3, centile, modélisation en continu
Introduction
Although researchers have realized that using arithmetic averages in scientometric investigations may lead to biased results (Leydesdorff et al. 2011), it has taken several years before an acceptable alternative was formulated. Slowly a [End Page 201] consensus has arisen, leading to the use of percentiles and percentile rank classes (Bornmann 2010; Bornmann et al. 2013; Leydesdorff and Bornmann 2011; Leydesdorff et al. 2011; Opthof and Leydesdorff 2010; Rousseau 2012). These notions are based on the concept of percentiles (or quantiles) for discrete data. As most informetric models can also be described within a continuous context (Egghe 2005), we propose a continuous analogue of the percentile approach and, as an illustration, calculate the resulting percentile rank scores and Integrated Impact Indicator (see further for definitions) for some basic functions.
Definitions
In this section, we use the framework as presented by Rousseau (2012). Consider a set A and a reference set S containing all elements in A, hence A ⊆ S. Moreover, we assume that a function X from S to the positive real numbers is given, leading to the image multiset X(S). Note that we consider X(S) as a multiset, as we consider the images X(s), s in S, as separate entities (even if their values are the same). A standard situation is the case that A consists of a set of articles, set S consists of all articles in the journals in which set A is published (published in the same year), and a function X which maps an article to the number of citations it has received over a given period (and there may be several articles with the same number of citations).
Now a rule is given which subdivides set S into M disjoint classes, based on the values of the function X. If a document belongs to class m, then it receives a score x_{m}. Note that this score only depends on the class (and hence on S), but may not depend on set A (Rousseau and Ye 2012). Again a standard situation is the case that there are 100 percentile classes (or 10 decile classes). In the case of percentiles, articles belonging to the top 1% receive a score of 100; those belonging to the top 2% (and not to the top 1%) receive a score of 99, and so on. Besides classes of equal breadth, one may also use classes of unequal breadth such as the six US National Science Foundation categories (National Science Board 2010).
Definition 1. Percentile rank scores (Bornmann and Mutz 2011; Leydesdorff et al. 2011)
Let A be a set of N documents, assume there are M classes, and let nA(m) be the number of documents in A that belong to class m. Then the percentile rank score of A is defined as:
R(A) can be seen as a weighted average of scores. Clearly, the value of R(A) depends not only on A, but also on the reference set S, the M classes used, and their score. We note that this indicator allows a lot of flexibility, but hence also a lot of subjectivity, as one can adapt the reference set, the classes, and the scores. [End Page 202]
Definition 2. The Integrated Impact Indicator (I3)
The I3 indicator (Leydesdorff and Bornmann 2011), where I3 stands for Integrated Impact Indicator, is defined in a similar way as the percentile rank score as given in Equation (1). The role of the reference set S is the same, but this time, no division by N is performed. Hence, using the notation introduced earlier, we have the following definition.
Definition 3. The I3 score of a set A is defined as:
Clearly, I3(A) = N · R(A).
In the context of journal impact, I3 is preferred to R as "having an impact" implies publishing many articles and receiving many citations.
We generalize the step functions (in Equation (2)) to continuous functions w(x) and k(x) defined on an interval [0,C], C > 0. This leads to:
The function k(x) is a density function, and w(x) > 0 is a scoring function acting as a weight for the function k(x). The origin of the interval [0,C] corresponds to the worst results—and hence the lowest scores—while the end point C corresponds to the best results and hence the highest scores. Consequently, w(x) is a nondecreasing (usually strictly increasing) function, while the density function k(x) ≥ 0 can have any form. If f(x) is a positive integrable function on [0,C], then we denote by N the integral ∫^{C}_{0} f(x)dx and f(x)/N becomes a density function on [0,C]. For any k(x) = f(x)/N, R (as defined in Equation (3)) times N becomes the continuous analogue of I3, which we also denote as I3. Hence, in a continuous setting I3 = ∫^{C}_{0} w(x) · f(x)dx, where f(x) is a positive integrable function on [0,C].
The reason that we refer to our approach as a continuous approach is that, besides the continuous density function k(x), we also consider a continuous analogue of the discrete weight or scoring values x_{m}.
Continuous examples: A first scoring function
We first consider the simple case that w(x) is a linearly increasing function on the interval [0,C]:
where c is a constant and a > 0 (as w is an increasing function). Choosing a value zero at the begin point, w(0) = 0, leads to c = 0; hence
Now we discuss different basic forms for the function k(x). [End Page 203]
Case 1. k(x) is a constant, corresponding to a uniform distribution
If f(x) = K (a constant) then N = K · C and k(x) = 1/C is a density function on [1,C], then we obtain
and
Case 2. k(x) is a linear function
We consider the linear decreasing function f(x) = m (C  x), with m > 0 and f(C) = 0. Note that we take a linearly decreasing function because it is assumed here that there are many poor cases and few better ones. Normalizing yields N = ∫^{C}_{0} m · (C x)dx = m·C^{2}/2 and hence k(x) = 2·(Cx)/C^{2} is a density function on [0,C].
Based on Equation (3) we obtain:
and hence:
Case 3. The function k(x) is an exponential function
We consider the function f(x) = be^{mx}; m ≠ = 0; b > 0. Then N = ∫^{C}_{0} b · e^{m·x}dx = b/m (e^{m·C}  1), leading to the density function k(x)= m·e^{m·x}/e^{m·C}1. Then:
and hence:
[End Page 204]
Case 4. The function k(x) is a decreasing power function
We consider f(x)= 1/x+m, m > 0. Then N = ∫^{C}_{0} 1/x+m dx = ln (C+m/m), and hence k(x) = f(x)/N is a density function. This leads to:
and
Case 5. k(x) is a triangular peak function
If f(x) is a triangular peak function with peak point at (C/2, b C/2), this yields:
Normalizing f(x) leads to N = b^{2} · C/4 and the corresponding density function k(x) = f(x)/N. The integral for R consists of two parts:
leading to:
Continuous examples: A second scoring function
Finally we consider a second scoring function w(x) which increases faster than a linear function. We consider the following increasing power function w(x)=a · xα, a > 0, α > 1. For f(x) we take the function b · x_{β}, with b > 0. Then N = ∫^{C}_{0} b · x^{α}dx = b/β+1 C_{β+1}. This leads to:
Conclusion
We calculated the value of I3 in a continuous framework for different distribution functions k(x) and for two scoring functions w(x) = ax and w(x) = a · x^{α}. In this way we introduced a method by which R and the I3 indicator can be [End Page 205] used in a continuous modelling context. In all cases, the resulting values are functions of the parameters introduced by the functions w(x) and k(x). One reviewer correctly pointed out that this flexibility may lead to some additional difficulties if one wants to use a continuous approach for modelling real data. In such cases, w(x), k(x), and possibly C must be estimated. However, when working on this article we had an abstract theoretical framework in mind, namely one not based on real data. By proposing this continuous approach, we hope to stimulate further investigations within a continuous modelling approach to I3.
yye@nju.edu.cn
Universiteit Antwerpen (UA), IBW, Belgium
KU Leuven, Belgium
ronald.rousseau@uantwerpen.be
Acknowledgements
The authors acknowledge the National Natural Science Foundation of China (NSFC Grants No. 7101017006 and 71173187) for financial support. They thank their colleagues Wolfgang Glänzel, Leo Egghe, and Loet Leydesdorff for useful comments. Finally, anonymous reviewers and the editor are acknowledged for their useful suggestions to improve the manuscript.