We wish to train a machine learning algorithm on an array of floating-point numbers in the
interval [0.0, 1.0). The data is horribly unbalanced (not evenly distributed) and we wish to
filter the dataset to obtain a subset containing an equal number of values from each
interval
[0, 0.2), [0.2, 0.4), ... [0.8, 1.0), throwing away as little data as possible.
Write a program which reads comma-separated floating-point numbers in a single line from
standard input and prints the filtered data to standard output in the same format
Note: Solve this in linear time, if possible. Priority will be given to those who solve in linear
time.
Explanation Example
Input: 0.11,0.12,0.13,0.23,0.34,0.35,0.47,0.59,0.77,0.83,0.85,0.91,0.95
On classifying the above input data from example 4, Subset in each interval will look as below:
Since the interval [0.6 - 0.8) has the minimum subset of size 1. We choose 1 element from
the rest of the intervals.
Output: 0.11,0.23,0.47,0.77,0.83
*if the interval [0.6 - 0.8) had more than 3 elements then we would choose 2 elements from all
subset, since the interval with minimum subset, would be [0.4 - 0.6) and of size 2.
Interval Data
[0 - 0.2) 0.11,0.12,0.13
[0.2 - 0.4) 0.23,0.34,0.35
[0.4 - 0.6) 0.47,0.59
[0.6 - 0.8) 0.77
[0.8 - 1.0) 0.83,0.85,0.91,0.95
Sample Examples:
Example 1
Input: 0.1,0.3,0.5,0.7,0.9
Output: 0.1,0.3,0.5,0.7,0.9
Example 2
Input: 0.1,0.3,0.5,0.7,0.9,0.5
Output: 0.1,0.3,0.5,0.7,0.9
Example 3
Input: 0.15,0.12,0.35,0.38,0.55,0.56,0.57,0.75,0.77,0.9,0.94
Output: 0.15,0.12,0.35,0.38,0.55,0.56,0.75,0.77,0.9,0.94
Comments
Leave a comment