5-1, 5-2, 5-3, 5-4, 5-5, 5-6, 5-7

5-1

You are interested in analyzing some hard-to-obtain data from two separate databases. Each database contains n numerical values—so there are 2n values total—and you may assume that no two values are the same. You’d like to determine the median of this set of 2n values, which we will define here to be the nth smallest value. However, the only way you can access these values is through queries to the databases. In a single query, you can specify a value k to one of the two databases, and the chosen database will return the kth smallest value that it contains. Since queries are expensive, you would like to compute the median using as few queries as possible. Give an algorithm that finds the median value using at most O(log n) queries.

Solution:

Binary search

This algorithm can be generalized to find the KTH element. we just need to specify k=n. Now we discribe the Kth algorithm in 2 database. Suppose we have database A and database B. the ith data of them can be represented as A[i] and B[i]. To find the nth smallest value, we can get A[k/2-1] and B[k/2-1] if n-1< size of A & B . Then we compare the A[k/2-1] with B[k/2-1], get minimum, if they are the same just minimum=A[k-1]=B[k-1]. Because there are k/2-1+k/2-1 =k-2 elements before the minimum, then minimum can't be the kth element. so the elements before the minimum and the element can't be the nth element. Now we need to "delete" them from the search range, that means we remeber the start number and every time we get A[i] means get A[i-start] of the "new database". If the k/2-1 is larger than the length of A or B, we just choose the last one to compare.

so now we need to update n with deleted nums.

if finally we update n to 1 we can compare the element being pointed in A and B and choose the minimum of them.

the algorithm is shown below.

class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        int length1 = nums1.length, length2 = nums2.length;
        int totalLength = length1 + length2;// in this case: length1=length2=n;
        if (totalLength % 2 == 1) {
            int midIndex = totalLength / 2;
            double median = getKthElement(nums1, nums2, midIndex + 1);
            return median;
        } else {
            int midIndex1 = totalLength / 2 - 1, midIndex2 = totalLength / 2;
            double median = (getKthElement(nums1, nums2, midIndex1 + 1) + getKthElement(nums1, nums2, midIndex2 + 1)) / 2.0;
            return median;
        }
    }

    public int getKthElement(int[] nums1, int[] nums2, int k) {
        /* 主要思路：要找到第 k (k>1) 小的元素，那么就取 pivot1 = nums1[k/2-1] 和 pivot2 = nums2[k/2-1] 进行比较
         * 这里的 "/" 表示整除
         * nums1 中小于等于 pivot1 的元素有 nums1[0 .. k/2-2] 共计 k/2-1 个
         * nums2 中小于等于 pivot2 的元素有 nums2[0 .. k/2-2] 共计 k/2-1 个
         * 取 pivot = min(pivot1, pivot2)，两个数组中小于等于 pivot 的元素共计不会超过 (k/2-1) + (k/2-1) <= k-2 个
         * 这样 pivot 本身最大也只能是第 k-1 小的元素
         * 如果 pivot = pivot1，那么 nums1[0 .. k/2-1] 都不可能是第 k 小的元素。把这些元素全部 "删除"，剩下的作为新的 nums1 数组
         * 如果 pivot = pivot2，那么 nums2[0 .. k/2-1] 都不可能是第 k 小的元素。把这些元素全部 "删除"，剩下的作为新的 nums2 数组
         * 由于我们 "删除" 了一些元素（这些元素都比第 k 小的元素要小），因此需要修改 k 的值，减去删除的数的个数
         */

        int length1 = nums1.length, length2 = nums2.length;
        int index1 = 0, index2 = 0;
        int kthElement = 0;

        while (true) {
            // 边界情况
            if (index1 == length1) {
                return nums2[index2 + k - 1];
            }
            if (index2 == length2) {
                return nums1[index1 + k - 1];
            }
            if (k == 1) {
                return Math.min(nums1[index1], nums2[index2]);
            }
            
            // 正常情况
            int half = k / 2;
            int newIndex1 = Math.min(index1 + half, length1) - 1;
            int newIndex2 = Math.min(index2 + half, length2) - 1;
            int pivot1 = nums1[newIndex1], pivot2 = nums2[newIndex2];
            if (pivot1 <= pivot2) {
                k -= (newIndex1 - index1 + 1);
                index1 = newIndex1 + 1;
            } else {
                k -= (newIndex2 - index2 + 1);
                index2 = newIndex2 + 1;
            }
        }
    }
}

take this as an example

Untitled

Each cycle reduces the search scope by half, so the time complexity is O(log(2n))=O(logn)

5-2

Recall the problem of finding the number of inversions. As in the text,we are given a sequence of n numbers a1, . . . , an, which we assume are alldistinct, and we define an inversion to be a pair i < j such that ai > aj. We motivated the problem of counting inversions as a good measure of how different two orderings are. However, one might feel that this measure is too sensitive. Let’s call a pair a significant inversion if i < j and ai > 2aj. Give an O(n log n) algorithm to count the number of significant inversions between two orderings.

Solution：

Use a recursive divide-and-conquer algorithm ALG which is similar to we learned in the class counting inversions. The difference is in conquer part, we compare bi with 2*bj for counting significant inversions.

Let's define ALG formally. For n=1 ALG just returns N=0 and $\left\{a_{1}\right\}$ for the sequence. For n>1 ALG does the following: