January 13, 2025

This seems like a query a programmer would possibly ask after one medicinal cigarette too many. The pc science equal of “what’s the sounds of 1 hand clapping?”. However it’s a query I’ve to resolve the reply to.

I’m including indexOf() and lastIndexOf() operations to the Calculate remodel of my knowledge wrangling (ETL) software program (Easy Data Transform). It will permit customers to seek out the offset of 1 string inside one other, counting from the beginning or the top of the string. Simple Information Rework is written in C++ and makes use of the Qt QString class for strings. There are indexOf() and lastIndexOf() strategies for QString, so I believed this may be a straightforward job to wrap that performance. Perhaps quarter-hour to program it, write a take a look at case and doc it.

Clearly it wasn’t that simple, in any other case I couldn’t be scripting this weblog put up.

To start with, what’s the index of “a” in “abc”? 0, clearly. QString( “abc” ).indexOf( “a” ) returns 0. Duh. Properly solely if you’re a (non-Fortran) programmer. Ask a non-programmer (comparable to my spouse) and they’ll say: 1, clearly. It’s the first character. Duh. Excel FIND( “a”, “abc” ) returns 1.

Okay, most of my prospects, aren’t programmers. I can use 1 based mostly indexing.

However then issues get extra difficult.

What’s the index of an empty string in “abc”? 1 possibly, utilizing 1-based indexing or possibly empty is just not a sound worth to go.

What’s the index of an empty string in an empty string? Hmm. I assume the empty string does include an empty string, however at what index? 1 possibly, utilizing 1-based indexing, besides there isn’t a primary place within the string. Once more, possibly empty is just not a sound worth to go.

I appeared on the Qt C++ QString, Javascript string and Excel FIND() operate for solutions. However they every give completely different solutions and a few of them aren’t even internally constant. This can be a easy comparability of the primary index or final index of textual content v1 in textual content v2 in every (Excel doesn’t have an equal of lastIndexOf() that I’m conscious of):

Altering these to make the all of the legitimate outcomes 1-based and setting invalid outcomes to -1, for simple comparability:

So:

  • Javascript disagrees with C++ QString and Excel on whether or not the primary index of an empty string in an empty string is legitimate.
  • Javascript disagrees with C++ QString on whether or not the final index of an empty string in a non-empty string is the index of the final character or 1 after the final character.
  • C++ QString thinks the primary index of an empty string in an empty string is the primary character, however the final index of an empty string in an empty string is invalid.

It appears surprisingly tough to provide you with one thing intuitive and constant! I feel I’m most likely going to return an error message if both or each values are empty. This appears to me to be the one unambiguous and constant method.

I might return a 0 for a non-match or when one or each values are empty, however I feel you will need to return completely different leads to these 2 completely different instances. Additionally, not discovered and invalid really feel qualitatively completely different to a calculated index to me, so shouldn’t be simply one other quantity. What do you assume?

*** Replace 14-Dec-2023 ***

I’ve been across the homes a bit extra following suggestions on this weblog, the Easy Data Transform forum and hacker news and this what I’ve determined:

IndexOf() v1 in v2:

v1 v2 IndexOf(v1,v2)
1
aba
aba 1
a a 1
a aba 1
x y
world whats up world 7

This is identical as Excel FIND() and differs from Javascript indexOf() (ignoring the distinction in 0 or 1 based mostly indexing) just for “”.indexOf(“”) which returns -1 in Javascript.

LastIndexOf() v1 in v2:

v1 v2 LastIndexOf(v1,v2)
1
aba
aba 4
a a 1
a aba 3
x y
world whats up world 7

This differs from Javascript lastIndexOf() (ignoring distinction in 0 or 1 based mostly indexing) just for “”.indexOf(“”) which returns -1 in Javascript.

Conceptually the index is the 1-based index of the primary (IndexOf) or final (LastIndexOf) place the place, if the V1 is faraway from the discovered place, it must be re-inserted with the intention to revert to V2. Due to layer8 on Hacker Information for clarifying this.

Javascript and C++ QString return an integer and each use -1 as a placeholder worth. However Simple Information Rework is returning a string (that may be interpreted as a quantity, relying on the remodel) so we aren’t certain to utilizing a numeric worth. So I’ve left it clean the place there isn’t a legitimate end result.

Now I’ve spent sufficient time down this rabbit gap and have to get on with one thing else! In case you don’t prefer it you’ll be able to all the time add an If with Calculate or use a Javascript remodel to get the end result you favor.

*** Replace 15-Dec-2023 ***

Fairly a little bit of debate on this matter on Hacker News.