Returning the nth largest values of a vector in R

Suppose I have a vector in R of the following six values:
U <- c(5, 12, 23, -7, 12, 3),
and I want the two largest values (or their indices in the vector). In this simple example it’s clear that the largest values are 23 and 12, and their indices are either 3 and 2, or 3 and 5 (depending on how we define the problem in the case of tied values).

One solution to this is:

sort(U, decreasing=T)[1:2]

to get the actual values, and

order(U, decreasing=T)[1:2]

to get the indices (which in this case is 3 and 2).

However, it is well known that sorting is typically an O(n log(n)) problem, whereas selecting the kth largest or smallest value is a linear O(kn) problem.

No problems, partial sort can solve this problem. Partial sorting partitions the vector into two with the first k being the smallest values of the vector:

sort(U, partial=2)

gives

[1] -7 3 12 5 23 12

however, partial sorting in R can’t sort in reverse order for some reason! But of course, we can sort -U instead. So:

(-sort(-U, partial=2))[1:2]

solves the problem of quickly returning the 2 largest values of a vector.

But finding the indices of these values is somewhat harder (although it probably shouldn’t be!). Looking at the help for the functions order and sort.list, we see that neither of these support partial sorting. Another option would be to cbind the vector U to 1:6 and partially sort the table as before by the first column, but as far as I can tell this is not possible.

 

So here is the solution I came up with:

argsortN <- function(a_vector, N){
  Nth.largest <- -sort(-a_vector, partial=N)[N] # from before

  # Find the indices of all values greater than the nth largest:
  ix.gte.Nth.largest <- which(a_vector >= Nth.largest) 

  # The next line sorts these in decreasing order:
  argN <- ix.gte.Nth.largest[order(a_vector[ix.gte.Nth.largest], decreasing=T)]

  # The only problem is if there are ties, the length of argN can be greater than N:
  return(argN[1:N])
}

The exact indices returned by this function in the case of ties will depend on the way sorting of ties is handled by order.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s