The Subset Sum Problem Solved in Linear Time for Dense Enough Inputs

, I’ll present an answer to the Subset Sum Problem, which has linear time complexity (), if all of the ‘’ input values are “close enough” to one another. We are going to see shortly what that condition actually means, in addition to why it’d often be kept (or almost kept) in practice. For the inputs which can be “almost close” to one another, the algorithm still has a certain probability of working fast enough.

This text is organized in the next way:

In chapter 1, we’ll recall the Subset sum problem and show probably probably the most trivial solution to it, based on the brute-force technique.
Chapter 2 recalls the category of NP-complete problems and the way the Subset sum problem is expounded to it.
Chapter 3 reviews the commonly used solution based on Dynamic programming technique, and highlights why it’s a pseudo-polynomial algorithm.
In chapter 4, I’ll introduce the brand new algorithm, called “Interval-based solution”.
Chapter 5 will show that, similarly to the Dynamic programming technique, the Interval-based algorithm may also restore the precise subset of the items that form given goal sum, and not only answer if the goal sum is achievable or not.
Finally, in chapter 6, we’ll see why the Interval-based algorithm actually reaches linear time complexity, if all of the ‘’ input values are close enough to one another (in addition to we’ll understand there what the “close enough” condition actually means).

1. Recalling the Subset sum problem

The definition of the Subset Sum Problem (SSP) is sort of easy:

We’re given ‘’ input integers ={₁, ₂, …, }, and a goal sum ‘’. It’s vital to work out if there exists such a subset ‘’ of those ‘’ integers, numbers of which is able to sum up exactly to ‘’. If it does, the issue may additionally ask to report all of the numbers of the subset ‘’.

Here is an example of input to the SSP:

And here is its solution:

Note, there will be cases with multiple solution, for a given goal sum ‘’:

Also, there will be cases when there is no such thing as a solution in any respect, meaning that there is no such thing as a such subset, integers of which is able to accumulate exactly to the given goal sum ‘’:

In this text, we’ll consider only positive input values ∈ . Nevertheless, all of the ideas to be presented beneath will be applied (with minor modifications) also to the case when there are negative input values as well.

Within the Subset sum problem (SSP), a slight change within the input values can lead to an entire change of the reply. Like, if there are various subsets that form the goal sum ‘’, it doesn’t mean that there will likely be even one subset, that forms the goal sum ‘+1’. This fact also makes the SSP not a straightforward one to unravel.

Even when there are usually not many input numbers, it’d still be difficult to unravel the issue on a bit of paper. Often, we’ll need to think about all different subsets and check which one in all them accumulates towards ‘’. That approach lies within the trivial solution of SSP, brute-force, which just enumerates all possible subsets.

The thought for brute-force implementation is: considering that there exists such a subset ⊆, items of which accumulate towards ‘’, let’s address the last input item . There are two scenarios:

either participates within the result subset ‘’,
or it doesn’t.

Having that said, if does take part in ‘’ (∈), then we should always proceed looking for such a subset from the reduced set of numbers “ {} = {₁, ₂, …, _-1}”, which is able to accumulate now to “–:

Otherwise, if the case is that doesn’t take part in ‘’ (∉), then we should always proceed looking for such a subset from the reduced set of numbers “ {} = {₁, ₂, …, _n-1}”, which is able to itself accumulate to the identical goal sum ‘’:

Surely, we don’t know upfront which case is true; that’s why the brute-force algorithm just tries one case after one other.

When trying to unravel the reduced problem (i.e., finding a correct subset from the reduced set of numbers “ {} = {₁, ₂, …, _-1}”), the brute-force algorithm applies the identical logic recursively. So, no matter which branch we took, next, the worth _-1 will likely be considered, and at first, we’ll search for an answer where _-1 does take part in the result subset, after which we’ll search for such an answer where it doesn’t.

While recursion deepens, if the remaining goal sum becomes zero, it signifies that we’re currently on the correct branch, and the considered numbers already accumulate to the unique goal sum ‘’.

The pseudo-code of the mentioned brute-force algorithm becomes like this:

// Searches for such a subset of ‘X’ which has a sum equal 
// to ‘q’, and places it into ‘Y’. The set ‘X’ is assumed 
// to have ‘n’ integers.
procedure ssp_brute_force( 
		X[1..n]: set of Integers, 
		n: Integer, 
		q: Integer, 
		Y: Reference to set of Integers )
	if q==0 then
			// By picking the correct integers into ‘Y’, now we have
			// iteratively reduced ‘q’ to zero, so the answer 
			// subset is already in ‘Y’.
		print Y
		return  // No must do anything on this branch
	if n==0 then
			// ‘q’ will not be zero, while the input set ‘X’ is 
			// exhausted. This branch didn’t led to an answer.
		return
	// Try with ‘X[n]’ in the answer subset
	place X[n] into Y
	ssp_brute_force( X{X[n]}, n-1, q-X[n], Y )
			// Proceed searching with the reduced set 
			// (‘X’ without the last integer X[n]), for 
			// such a subset, which sums to ‘q-X[n]’.
	remove X[n] from Y
	// Try without ‘X[n]’
	ssp_brute_force( X{X[n]}, n-1, q, Y )
			// Proceed searching with the reduced set 
			// (‘X’ without the last integer X[n]), for 
			// such a subset, which still sums to ‘q’.

Upon the pseudo-code, we see that solving the issue for ‘’ input items requires solving two problems, each with ‘-1’ input items. Each of those will, in its turn, require solving two reduced problems, this time with ‘-2’ input items, resulting overall in 4 problems with ‘-2’ input items each. This fashion, the variety of problems doubles on each arrival of a brand new input item, which makes the time complexity of the brute-force algorithm exponential – (2).

In practice, this algorithm will be used provided that ‘’ is small enough.

Nevertheless, the following algorithms for the Subset sum problem which will likely be described in this text, still depend on the logic of “using vs. not using” the present item.

2. Subset sum and NP-complete problems

There’s a category of problems in Computer Science, called “NP-complete”, which consists of such problems which can be considered difficult to unravel. The described Subset Sum Problem belongs to the NP-complete class. More precisely, to ensure that an issue to be NP-complete, the next criteria should be met:

It’s a decision problem, meaning that for any input to the issue, the output is either “yes” or “no”.
- Regarding the Subset sum problem, any input consisting of a set of input values = {₁, ₂, …, } and a goal sum ‘’ leads to a “yes” or “no” answer: we either can pick a subset which accumulates towards ‘’, or we are able to’t.
When the reply is “yes”, this will be demonstrated through the existence of a brief (polynomial length) solution.
- In regard to the SSP, if it is feasible to choose and present such a subset ⊆, we are able to easily sum up all its integers, and check if that actually equals ‘’ or not.
The correctness of every solution will be verified quickly (namely, in polynomial time), and a brute-force search algorithm can find an answer by trying all possible solutions.
- The brute-force solution for Subset sum problem is what now we have just recalled within the previous chapter.
The issue will be used to simulate every other problem for which we are able to quickly confirm that an answer is correct. Hence, if we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of each other problem to which a given solution will be easily converted.

The last statement shows that there are also other NP-complete problems, in addition to that any of them will be converted into one other. So, if a polynomial-time algorithm for one NP-complete problem will likely be found, we could use it to unravel some other NP-complete problem, by converting it to the previous one. Here is a standard conversion diagram between different NP-complete problems:

We see that, for instance, the “Boolean formula satisfiability problem” will be converted into the “Subset sum problem”, while the latter one will be converted into the “Knapsack problem”.

3. The Dynamic programming solution of the Subset sum problem

The common solution of the Subset Sum Problem (SSP) uses a Dynamic Programming technique. The principle of any Dynamic Programming algorithm lies in initially solving problems of smaller sizes, and later using those solutions to unravel the initial large problem.

Let’s “” be a boolean matrix having dimensions “[0..][0..]”, with
“[][]” denoting if we are able to gather the sum ‘’ by selecting only between the primary ‘’ input items.

= {8, 4, 11, 3, 9},
= 28.

In the primary chapter, we already expressed the worth “[][]” in a recursive way. “[][]” denotes whether we are able to obtain the sum ‘’ by selecting between all of the ‘’ input integers, which is the reply to the initial problem. We addressed two cases there:

If the last item ‘’ participates within the result subset ‘’, then we should always have the ability to acquire the sum ‘–’, by selecting between the primary ‘-1’ input items. In other words, “[-1][–]” ought to be equal to “true”.
Otherwise, if the last item ‘’ doesn’t take part in the result subset ‘’, then we should always manage to acquire the goal sum ‘’ by selecting also between the primary ‘-1’ input items. In other words, “[-1][]” ought to be equal to “true” then.

There is no such thing as a third option. If either of those two conditions is satisfied, then the goal sum ‘’ is reachable. In the event that they each are satisfied, it signifies that ‘’ will be obtained each by picking ‘’ and by not picking it into the result subset ‘’.

So the formula for the initial problem becomes:

[begin{equation*}
S[n][q] = S[n-1][q-x_n] or S[n-1][q]
end{equation*}]

And it really works not just for “[][]” but additionally for any “[][]”, because the logic stays the identical:

[begin{equation*}
S[i][p] = S[i-1][p-x_i] or S[i-1][p]
end{equation*}]

Now, when filling the matrix “[0..][0..]”, we see that the worth of any cell ‘[][]’ depends only on two other cells, each situated within the row above:

This implies we are able to calculate the matrix in top-down order, which is able to guarantee that in the intervening time “[][]” is being calculated, we already know the values of each “[-1][–]” and “[-1][]”.

What’s required to begin the calculation is the content of the primary row “[0][], ∈[0..]”. The cell “[0][]” denotes if it is feasible to collect sum ‘’, having only the primary 0 input items (i.e. having no item in any respect)”. Obviously, the reply is “false” if “>0” and is “true” provided that “==0” (we are able to gather the sum 0, without using any item).

Having that said, we are able to calculate all cells of the table within the top-down order. For our input set, the result will likely be:

The pseudo-code of the Dynamic Programming solution becomes:

// Given an n-long set of integers ‘X’, returns if it is feasible 
// to seek out such a subset of it, which sums as much as ‘q’.
function ssp_dynamic_programming( 
		X[1..n]: set of Integers, 
		n: Integer, 
		q: Integer ) : Boolean
	S[0..n][0..q]: Matrix of Booleans
	// Fill first row of ‘S’
	S[0][0] := true
	for p in [1..q]
		S[0][p] := false
	// Fill content of the matrix
	for i in [1..n]
		for p in [0..q]
			if p < x[i]
				S[i][p] := S[i-1][p]
			else
				S[i][p] := S[i-1][p-x[i]] or S[i-1][p]
	// The reply is within the bottom-right cell
	return S[n][q]

The underside-right cell “[][]” will contain the reply to the unique problem, stating if we are able to gather the sum ‘’ by selecting between all of the ‘’ input items.

The presented solution requires filling the matrix “”, which has “(+1)*(+1)” cells. Calculating each cell is completed in (1) time. Hence, the time complexity of the Dynamic Programming algorithm becomes (). This solution is named pseudo-polynomial, because here the factor ‘’ will not be proportional to any polynomial of the issue’s size. In reality, ‘’ will be proportional even to the exponent of problem’s size.

4. Introducing the Interval-based algorithm

In the final case, the table “” filled within the previous chapter can have quite unpredictable content at the tip. Nevertheless, if the input values “ = {₁, ₂, …, }” do satisfy certain conditions, rows of the filled table might contain many adjoining cells with “true” values, in addition to many adjoining cells with “false” values.

= {9, 4, 3, 12, 5},
= 30.

In the present row, let’s denote those adjoining cells with a price of “true” as “true-intervals”, and the adjoining cells with a price of “false” as “false-intervals”.

If we all know upfront that the calculated table “” could have sufficiently long true-intervals and/or sufficiently long false-intervals at the tip, it is going to make sense to compute “” not cell-based (as we did within the previous chapter), but interval-based.

Then, as an alternative of representing every row as a boolean array, we'll represent it as a sequence of ordered true-intervals. Inside the current row, the true-intervals never intersect, so we are able to keep them sorted by their starting points. Also, for simplicity of the formulas to come back later, when denoting a true-interval, we'll mean a half-opened range [..). So, for example, a true-interval [5..8) will mean that on the current row, the cells 5, 6, and 7 are equal to “true”, while cell 8 is “false”.

Now, having the true-intervals (later denoted just as “intervals”, as false-intervals do not play a role in the algorithm to be described) of the previous row ‘-1’, how should we compute the intervals of the current row ‘’? Let’s recall the formula for filling the table:

[begin{equation*}
S[i][p] = S[i-1][p-x_i] or S[i-1][p]
end{equation*}]

If altering it for a moment, and assuming that there is simply the second a part of the OR-expression:

[begin{equation*}
S[i][p] = S[i-1][p]
end{equation*}]

it is going to end in all cells of the previous row being copied into the present row. In other words, for that it might be simply enough to repeat the intervals from row ‘-1’ to row ‘’.

However, if altering the formula in the opposite way, assuming that there is simply the primary a part of the OR-expression:

[begin{equation*}
S[i][p] = S[i-1][p-x_i]
end{equation*}]

it is going to still end in copying all of the cells from the previous row to the present row, but this time shifted by ‘’ positions rightwards. So that's what will likely be vital to do also with the intervals:

Finally, referring to the unique formula:

[begin{equation*}
S[i][p] = S[i-1][p-x_i] or S[i-1][p]
end{equation*}]

it requires us to do “OR” on the 2 obtained sequences of intervals, which is identical as uniting them geometrically.

Summarizing what was said above, when filling the table of true-intervals, to compute the content of row ‘’ we should always:

shift the content of row ‘-1’ to the correct by ‘’ positions, and
unite it with the unique content of row ‘-1’.

Note, this also signifies that the span of row ‘’ (right endpoint of the right-most interval) will at all times be by ‘’ units greater than the span of row ‘-1’.

So the span of any row ‘’ will be calculated as:

[begin{equation*}
span(i) = 1 + x_1 + x_2 + … + x_i = 1 + sum_{k=1}^i x_k
end{equation*}]

where the free term “1” comes due to all of the intervals being half-open (the initial interval at row 0 is [0, 1), so “(0) = 1”).

Considering that the previous row ‘-1’ has ‘’ intervals, shifting them all by will obviously require () time. Calculating the union of 2 sequences, each having ‘’ intervals, can also be performed in () time if scanning both sequences from left to right
simultaneously. This means that the transition from the previous row to the current one requires () time.

Assuming that the maximal number of intervals on any row is ‘’, the time complexity for the Interval-based algorithm becomes ().

The important note that we should make here is that the value ‘’ significantly depends on the order of input values “ = {₁, ₂, …, }”. Here is an example with “=5” values, which at the end produces lots of intervals in the table.

= {6, 11, 4, 2, 5}.

And here is the problem solved with the same set of input values “={₁, ₂, …, }”, but arranged in a different order, more precisely, in ascending order.

= {2, 4, 5, 6, 11}.

Such a behavior is quite intuitive: if we want the intervals of the current row to remain as few as possible, their span should grow as slowly as possible. And an obvious way to achieve that is to consider all the input values in ascending order.

5. Restoring the exact subset of values

The Dynamic programming solution of the Subset sum problem, which we reviewed in chapter 3, not only answers if it is possible to obtain given sum ‘’ from the input items “ = {₁, ₂, …, }”, but also specifies the exact subset of ‘’ (let’s denote it by ‘’, ‘⊆’), items of which sum up to ‘’. To retrieve ‘’, we must move over the table in the opposite direction – from bottom to top.

The last cell of the table “[][]”, which incorporates the reply to the unique problem, was calculated by the formula:

[begin{equation*}
S[n][q] = S[n-1][q-x_n] or S[n-1][q]
end{equation*}]

If “[][]” appears to be true (meaning that the goal sum ‘’ is reachable), it signifies that at the least one in all the two operands of the OR-expression can also be “true”. So, we are able to check 2 cases:

if “[-1][] == ”, it signifies that the goal sum ‘’ will be obtained also by selecting only between the primary ‘-1’ input items. So the last item ‘’ doesn't take part in ‘’, and we are able to proceed constructing ‘Y’ only from the primary ‘n-1’ items.
if “[-1][–] == ”, it signifies that the primary ‘-1’ input items participated in constructing the sum ‘–’, after which we added the last item ‘’, and resulted with the vital sum ‘’. So the last item ‘’ was actually used, and we must construct now one other goal sum ‘–’, by selecting between only the primary ‘-1’ input items.

This judgement answers whether the last item ‘’ participates within the goal sum ‘’. But the identical logic also works for determining the participation of each other input item ‘’, in some other goal sum ‘’. Recalling the final formula by which the complete table was filled:

[begin{equation*}
S[i][p] = S[i-1][p-x_i] or S[i-1][p]
end{equation*}]

the same judgement will be made within the case if any “[][] == ”, which is able to help us to know if the item ‘’ participates in making sum ‘’, or not.

So to reconstruct the whole subset ‘’, we just apply this principle repeatedly. The pseudo-code of retrieving ‘’ becomes:

// Given the filled matrix ‘S’, returns an actual subset of 
// the n-long set ‘X’, integers of which sum as much as ‘q’.
function obtain_subset( 
		X[1..n]: set of Integers,
		n: Integer, 
		q: Integer, 
		S[0..n][0..q]: Matrix of Booleans ) : Set of Integers
	if S[n][q] == false then
		return ∅  // ‘q’ can’t be obtained
	Y := empty set of Integers
	while n > 0
		// Here we all know that “S[n][q]” is at all times true
		if S[n-1][q] == true then
			// Item ‘x[n]’ will be not utilized in ‘Y’
		else
			// Item ‘x[n]’ ought to be utilized in ‘Y’
			insert X[n] into Y
			q := q-X[n]  // Stays to construct the sum ‘q-X[n]’
		// Move upwards to the preceding item
		n := n-1
	return Y

Time complexity of this function is (), as on every iteration we move upwards by one row over the table, while its height is ’+1’.

In our example, reconstructing the result subset ‘’ is performed by the next path:

= {9, 4, 3, 12, 5},
= 30.

Can we act similarly within the Interval-based solution? We see that the one thing needed to reconstruct subset ‘’ within the Dynamic programming solution is knowing the worth of cell “[][]”, which, in relation to the Interval-based solution, is knowing whether the -th sequence of intervals covers the coordinate ‘’. That will be checked by running a Binary search over the present -th sorted sequence of intervals.

Assuming that the -th sequence incorporates ‘’ intervals, binary search will take ( ) time. To retrieve the whole subset ‘’, we run binary search on each of the ‘’ sequences of intervals. If there are at most ‘’ intervals on each row, retrieval of the result subset ‘’ will run in () time. Techniques like Fractional cascading can probably be applied here to hurry up the subset retrieval process.

6. Time complexity of the Interval-based algorithm

In Chapter 4, we derived the time complexity of the Interval-based algorithm as (), where ‘’ is the variety of input values and ‘’ is the maximal variety of intervals on a row. We also noted there that the order of input values “ = {₁, ₂, …, }” matters significantly, and ‘’ generally leads to a smaller value when items of ‘’ arrive in an increasing order. Now, are there such input cases for which the worth ‘’ will likely be really small?

Case 1: A geometrical progression

First let’s consider the case when values of ‘’ form a geometrical progression with initial value of ‘1’ and a standard ratio of ‘2’:

[begin{equation*}
X = { 1, 2, 4, 8, 16, …, 2^{n-1} } : x_i = 2^{i-1}
end{equation*}]

What shape could have the set of constructed intervals?

As we already know, having intervals of the previous row ‘-1’, to calculate intervals of the present row ‘’, there are 2 steps:

offset all intervals of row ‘-1’ by ‘’ to the correct,
unite the offsets with original content of row ‘-1’.

Doing that on the mentioned input will end in:

row 0 at all times has just one interval [0, 1), stating that only the sum ‘0’ is achievable if using no input item at all.
as ‘₁=1’, the shifted interval becomes “[0,1) >> 1 = [1,2)”, and after uniting with the original one we have: “[0,1) ꓴ [1,2) = [0,2)”,
as ‘₂=2’, the shifted interval becomes “[0,2) >> 2 = [2,4)”, and after uniting with the original one we have: “[0,2) ꓴ [2,4) = [0,4)”,
as ‘₃=4’, the shifted interval becomes “[0,4) >> 4 = [4,8)”, and after uniting with the original one we have: “[0,4) ꓴ [4,8) = [0,8)”,
and so on…

We see that each row ‘’ has just 1 interval: [0,2).

= {1, 2, 4, 8, 16, …}

Of course, the presented case is very special, and it is not a surprise that any target sum ‘ ∈ [0, 2)’ can be constructed from those ‘’ numbers. If representing ‘’ in the binary numeral positional system, then its ‘1’ digits will correspond to the powers of ‘2’ (the input values), which should be added up to get ‘’.

Case 2: Slower than a geometric progression

Values grow rapidly in any geometric progression. In the previous case, as the common ratio was equal to 2, we could also express every value ‘’ as the sum of all previous values, plus 1:

[begin{equation*}
x_i = 1 + sum_{k=1}^{i-1} x_k
end{equation*}]

What if the sequence ‘’ grows more slowly? In other words, what if after sorting all values of ‘’ in an increasing order, each value ‘’ seems lower than or equal to the sum of all previous values, plus 1?

[begin{equation*}
hspace{1cm} x_i leq 1 + sum_{k=1}^{i-1} x_k hspace{1cm} (1)
end{equation*}]

To grasp how the intervals will likely be derived in such a case, let’s recall from Chapter 4 that the span of row ‘’ equals the sum of all previous and the present input values, plus 1:

[begin{equation*}
span(i) = 1 + sum_{k=1}^{i} x_k
end{equation*}]

Now, if every value ‘’ is lower than or equal to the sum of all previous values, it signifies that ‘x_i’ can also be lower than the span of its previous row:

[begin{equation*}
x_i leq 1 + sum_{k=1}^{i-1} x_k = span(i-1)
end{equation*}]

which implies that if there is simply one interval at row ‘-1’, uniting it with its offset rightwards by ‘’, will still end in just one interval at row ‘’. Initially, now we have just one interval in row ‘=0’. So, within the case when input values grow more slowly than the mentioned geometric progression (1), each row of the table could have just one interval.

= {1, 2, 3, 6, 11, 20},

Concluding this case, if after sorting all of the input values of ‘’ increasingly, now we have the constraint:

[begin{equation*}
x_i leq 1 + sum_{k=1}^{i-1} x_k
end{equation*}]

the table could have only “=1” interval at every row, and the Interval-based algorithm itself will run in guaranteed () time. Let’s note that from the sensible perspective, such a constraint might often be satisfied. If the input data is available in random order, then a further () time will likely be required to sort it.

Case 3: Almost slower than a geometrical progression

Let’s generalize the previous case 2, which had the constraint:

[begin{equation*}
x_i leq 1 + sum_{k=1}^{i-1} x_k
end{equation*}]

Now we'll allow for it to be violated every now and then.

Once that happens for some ‘’, we are able to expect that the variety of intervals ‘’ in row ‘’ will turn out to be greater. But how long can ‘’ grow? Let’s observe it on the next example:

= 7,
= { 1, 2, 5, 7, 10, 28, 30 },

It is going to be easier for us to write down down one other array ‘’, where ‘’ is the same as the sum of all preceding values:

= { 0, 1, 3, 8, 15, 25, 53, 83 }

… we see that the mentioned condition is violated on the input values ‘₃=5’ (as ‘₃>₃+1^’) and ‘₆=28’ (as ‘₆>₆+1’).

Now let’s construct the intervals. This time, because the span of all of the inputs is large (“() = 83+1 = 84”), for the last 2 rows, we'll write down the sequences of intervals, as an alternative of presenting them geometrically:

As we are able to see, if for some ‘’ the mentioned condition is violated, the variety of intervals is doubled on that row. In our example, that took place for input values ‘₃’ and ‘₆’. It happens because all of the shifted intervals appear rightwards from the span of the present intervals:

Nevertheless, on the opposite rows, the variety of intervals tends not to extend much, because a lot of them, after being united with the present row’s content, just turn into one long interval. In our example, that explicitly happened on the last row ‘₇=30’.

The naming of this text comes from the indisputable fact that if all input items “ = {₁, ₂, …, }” are close enough to one another, many intervals will are inclined to unite throughout the top-down construction of the table, so the constant ‘’ might remain bounded with certain probability. If that's the case, we'll find yourself with “() = ()” linear time solution for the Subset Sum Problem, assuming that the input values arrive in a sorted way. Otherwise, the initial sorting of values of ‘’ becomes probably the most time-consuming step, requiring additional “( )” time.

Case 4: Faster than a geometrical progression

Let’s consider yet another case, when the sorted input values
“ = {₁, ₂, …, }” grow faster than the mentioned geometric progression with a standard ratio of two. That may occur if every value ‘’ is larger than the sum of all previous values, plus 1:

[begin{equation*}
hspace{1cm} x_i > 1 + sum_{k=1}^{i-1} x_k hspace{1cm} (2)
end{equation*}]

Within the previous case 3, we already noted that after it happens for some ‘‘ (i.e., when ‘’ appears rightwards from the span of all previous values), the variety of intervals doubles, because no common fragments remain between the present and the shifted sequences of intervals.

And in the present case, when ‘’ grows faster than a geometrical progression with a standard ratio of two, it happens for each input value ‘’, so the variety of intervals doubles on every row, leading to an exponential growth.

Let’s consider the next example:

= 6,
= { 2, 5, 9, 20, 39 }.

The array with sums of all preceding values will likely be:

= { 0, 2, 7, 16, 36, 75 }.

So we see that each item ‘’ is larger than the sum of all previous items, plus 1. Constructing intervals will end in the next table:

So we see that if input values “” have the mentioned constraint (2), proceeding with the interval-based algorithm will not be the most effective idea, as it is going to end in 2 short intervals on the last row, thus requiring (2) time to run. If we all know upfront that this will likely be our case, one other decision-based algorithm will be applied, which is able to pick a vital subset of “” in linear time ().

Conclusion

In this text, now we have observed a novel approach for solving the Subset Sum Problem, called “Interval-based algorithm”. It is analogous to the Dynamic Programming solution, with the difference that here we operate not on individual cells of the table, but on its continuous ranges.

We've got observed 4 special distributions of input values, and proved that if the inputs are dense enough, the Interval-based algorithm runs in linear time. We've got also shown that when the inputs are “almost dense”, the algorithm might still work fast enough. Just like the Dynamic Programming solution, the Interval-based algorithm allows obtaining the precise subset, items of which sum as much as the given goal.

The Subset Sum Problem is one in all the NP-complete problems; thus, this text highlights one other special case of inputs, for which they will be solved fast enough.

I'm completely happy that you could have read till the tip, and thanks to your interest!

.

The Subset Sum Problem Solved in Linear Time for Dense Enough Inputs

1. Recalling the Subset sum problem

2. Subset sum and NP-complete problems

3. The Dynamic programming solution of the Subset sum problem

4. Introducing the Interval-based algorithm

5. Restoring the exact subset of values

6. Time complexity of the Interval-based algorithm

Case 1: A geometrical progression

Case 2: Slower than a geometric progression

Case 3: Almost slower than a geometrical progression

Case 4: Faster than a geometrical progression

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

A Tale of Two Variances: Why NumPy and Pandas Give Different Answers

How Vision Language Models Are Trained from “Scratch”

Why Care About Prompt Caching in LLMs?

Supply-chain attack using invisible code hits GitHub and other repositories

Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline

The Subset Sum Problem Solved in Linear Time for Dense Enough Inputs

1. Recalling the Subset sum problem

2. Subset sum and NP-complete problems

3. The Dynamic programming solution of the Subset sum problem

4. Introducing the Interval-based algorithm

5. Restoring the exact subset of values

6. Time complexity of the Interval-based algorithm

Case 1: A geometrical progression

Case 2: Slower than a geometric progression

Case 3: Almost slower than a geometrical progression

Case 4: Faster than a geometrical progression

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.