Export thread

A list to show what's missing?

#1

Chad Sexington

Chad Sexington

I have a list of files, that are all numbered, from 0000 to 2161. However, in the folder, only 1665 files are present, which means I'm missing 495 files.

Unfortunately, it's not one big chunk that is missing, and I can't seem to find a pattern in what isn't present. How can I search or sort the numbers, to what numbers aren't present between 0000 and 2161?

For the curious, or who might see a pattern, the missing numbers between 0000 and 0110 are below. I did this by hand just to see if I could find a pattern, but I do not think I can. It's a total of 32 missing files for the first 111 numbers:

0004, 0005, 0006
0012, 0013
0022, 0023
0025
0034
0036, 0037, 0038, 0039
0043
0051, 0052
0058
0060
0068
0072, 0073
0076
0091
0092
0095
0097
0100, 0101
0104, 0105
0107, 0108


#2

strawman

strawman

From the command line do

dir /b > filelist.csv

Then open that file in excel. Sort the data by that column so the numbers go in order. Using find and mid/left/right extract the number into the B column. Maybe something like =VALUE(MID(A1,5,4)) . Then use the following formula in the C column:

=IF(B2-B1>1,B2-B1-1,"")

If the number skips, then it'll print the number of files missing in this instance. It'll print nothing if the files don't skip.[DOUBLEPOST=1389632283,1389631755][/DOUBLEPOST]There's probably a more elegant way to extract just the missing numbers from this, but since it's probably a one-off job I'd just continue on with easy stuff.

Go ahead and change column C to a simpler =B2-B1-1, then in column D do:

=IF(C2>0,B1+1,"")

In column E do:

=IF(C2>1,B1+2,"")

This will return the missing files if you have runs up to 2 in a row missing. You'll have to continue the pattern in F, G, H, and so forth if you find you have runs of numbers much longer.


#3

Chad Sexington

Chad Sexington

From the command line do

dir /b > filelist.csv
This does not seem to work, but I may be doing something wrong. /b anything turns out

"/be is not recognised as an internal or external command, operable program or batch file."


#4

strawman

strawman

Do you have the "dir" in front of it?

Code:
dir /b > filelist.csv
[DOUBLEPOST=1389634960,1389634830][/DOUBLEPOST]You should be able to do

Code:
dir
Alone to get a directory list. Then

Code:
dir /b
Will get you a directory list without all the extra information, just the file names.

The final part is a "pipe" and steals the output from the screen and puts it in a file:

Code:
dir /b > filelist.csv
Now you should have a filename called "filelist.csv" in that directory with the list of files.


#5

Chad Sexington

Chad Sexington

I figured out my mistake; it wouldn't work with a space between > and filelist.csv. Now to Excel! Thank you so much, @stienman


#6

evilmike

evilmike

From the command line do

dir /b > filelist.csv

Then open that file in excel. Sort the data by that column so the numbers go in order. Using find and mid/left/right extract the number into the B column. Maybe something like =VALUE(MID(A1,5,4)) . Then use the following formula in the C column:

=IF(B2-B1>1,B2-B1-1,"")

If the number skips, then it'll print the number of files missing in this instance. It'll print nothing if the files don't skip.[DOUBLEPOST=1389632283,1389631755][/DOUBLEPOST]There's probably a more elegant way to extract just the missing numbers from this, but since it's probably a one-off job I'd just continue on with easy stuff.

Go ahead and change column C to a simpler =B2-B1-1, then in column D do:

=IF(C2>0,B1+1,"")

In column E do:

=IF(C2>1,B1+2,"")

This will return the missing files if you have runs up to 2 in a row missing. You'll have to continue the pattern in F, G, H, and so forth if you find you have runs of numbers much longer.
More generally, you could do something like this:

Paste your file list into Column A.
Use the VALUE(MID(A1,5,4) formula in column B.
Series fill column C with your entire list of numbers.
Use IF(LOOKUP(C1,B$1:B$2161)<>C1,"Missing","") in column D.
Use an autofilter to look for non-blank cells in column D.
You should now have a list of all missing files.


#7

Chad Sexington

Chad Sexington

More generally, you could do something like this:

Paste your file list into Column A.
Use the VALUE(MID(A1,5,4) formula in column B.

Using find and mid/left/right extract the number into the B column. Maybe something like =VALUE(MID(A1,5,4))
Okay, what is this supposed to be?
=VALUE(MID(A1,5,4)

Obviously I can't just copy this formula, it just pritns "#VALUE!" and I'm not sure what you guys means by using the B column to extract the number: the only thing in column is the list of numbers anyway


#8

strawman

strawman

If your filenames look like this: "file0123.txt" then the formula I gave above returns just the number. It looks at A1, and, starting with the fifth character, and gives up to 4 characters. That's what the MID function does - returns the middle of a text string. The VALUE function just tries to turn it into a number, rather than text.

If your filenames all start with the same prefix, then this will work. If they have prefixes of varying length we'll have to try something else.

Take out the VALUE function and play with the 4 and 5 values until it looks like a number:

=MID(A1,5,4)[DOUBLEPOST=1389636382,1389636352][/DOUBLEPOST]Alternately, post 5-10 examples of the filenames and I'll give you the formula you need.


#9

evilmike

evilmike

We were assuming that your file names had something other than just numbers in them. If the only thing on each cell is the number in text format, you should be able to use =VALUE(A1).

Also, can you attach the text file that has the file name list to your next message?


#10

Tinwhistler

Tinwhistler

Can you attach your csv file here? I can write a quick and dirty that'll do the job for you


#11

strawman

strawman

...can you attach the text file that has the file name list to your next message?
Can you attach your csv file here? I can write a quick and dirty that'll do the job for you
"If you give a man a fire, he'll be warm for the night. If you set a man on fire, he'll be warm for the rest of his life."


#12

Chad Sexington

Chad Sexington

Sure sure.

I see the problem, then. No, my filenames are all just 0000, 0001, etc. And I need a list of the numbers that don't show up between 0000 and 2161. Attached is a text file; this is what shows up in Excel's A column.

Thanks you guys. I feel ... like an idiot some times :D

Attachments



#13

Tinwhistler

Tinwhistler

Sure sure.

I see the problem, then. No, my filenames are all just 0000, 0001, etc. And I need a list of the numbers that don't show up between 0000 and 2161. Attached is a text file; this is what shows up in Excel's A column.

Thanks you guys. I feel ... like an idiot some times :D
*you* feel like an idiot? I can't find the attach button.
(n/m...found it.heh)

Attachments



#14

evilmike

evilmike

And, in case you want to set yourself of fire later, here is an excel file that handles the filtering.

Attachments



#15

Tinwhistler

Tinwhistler

Not clean, optimized or elegant, but it got the job done in 2 minutes of work, and 4 seconds of runtime

Code:
		Dim strFiles As String() = TextBox1.Text.Split(vbNewLine)
		For i = 1 To 2161
			Dim bFound As Boolean = False
			For Each strFile As String In strFiles
				If Val(strFile) = i Then bFound = True
			Next
			If bFound = False Then TextBox2.Text &= i & vbCrLf
		Next
[DOUBLEPOST=1389638210,1389637863][/DOUBLEPOST]I'm embarrassed. I mixed two CRLF constants there, vbNewLine and vbCrLf.

They're the exact same thing, only vbCrLf is a holdover from old (like vb4.0) days, and I still catch myself using it sometimes ;)


#16

strawman

strawman

I still use vbCrLf because I touch VB so infrequently these days. However, they are only the same on systems where a newline is encoded as CR LF. In theory you should find that vbNewLine produces only LF on linux systems, though I have no idea if mono actually does this or not.


#17

Chad Sexington

Chad Sexington

Thanks guys. I will look over this stuff since I want to, (and should) understand it better. It's a long story, but this will help my employees work over the next couple of days. Thank you very kindly.


#18

Tinwhistler

Tinwhistler

I still use vbCrLf because I touch VB so infrequently these days. However, they are only the same on systems where a newline is encoded as CR LF. In theory you should find that vbNewLine produces only LF on linux systems, though I have no idea if mono actually does this or not.
vbNewLine is a constant, and as such, it should always produce CrLf.

Environment.Newline is a property of the Environment class and should produce the appropriate newline (LF or CrLf) depending on the platform.


#19

PatrThom

PatrThom

This is the sort of thing that would be best handled by a program rather than a macro.
Heck, even in BASIC it would be easier to just FOR/NEXT through the numbers 0000 to whatever and report whether or not it was able to touch xxxx.txt for each filename.

--Patrick


Top