Excel Whiz Required

Status
Not open for further replies.
So I have this massive spreadsheet that was given to me with one column with about 100000 rows that require to be cleaned up. The client, the tech peasant that he is, over the course of the month entered it in this way.

What I need to do is to clean the column and place select rows them in the appropriate column I'll create. Everything is a number in here with the only flags is that the client put a special character at the end such as : and *

Don't ask.

Sadly, if it was at the start, I could have done something pretty easily.

What can I do?


Here's a sample size.


J5678234
0944645 :
H812634
0906618 :
A467832
0906843 :

I want it to look like

Column A :

J5678234
H812634
A467832

Column B :

0944645 :
0906618 :
0906843 :
 
Hmm... I can't get you there in one step, but I can provide an easier way to determine which rows need to be moved to column B; but it may be version dependent.
 
I found a likely solution (here):

formula option

starting in row1,

=INDEX(A:A,1+(2*ROW()-1))

for the even positions

=INDEX(A:A,1+(2*ROW()-2))

for the odd
 
Do they always alternate like that, or did you just write it that way?

If they don't actually alternate, and you need the rows with special characters to include their special characters, then I would suggest setting up two columns, B and C next to your data column, then in B1, input (where : is the special character):

=IF(ISNUMBER(FIND(":",A1)),"",A1)

And in C1, input:

=IF(ISNUMBER(FIND(":"A1)),A1,"")

Then drag the formulas down.

In theory, if it doesn't find the character, the string will get printed in column B, and if it does find the character, the string will get printed in column C.

The delete column A, filter out the blanks, and you're done.

EDIT: fixed a typo.
 

Necronic

Staff member
Do you have access to Microsoft Access?

If so you can import this to access and solve it w/ a relatively simple query. Even if you've never used it it would be pretty simple.

Edit: I wrote something similar to above (using the "RIGHT" function instead) but you'll need to copy those values (as values) to new columns or deleting column A will empty all of it. The reason I didn't include this solution is that filtering the blanks isn't trivial (or is it?)
 
=INDEX(A:A,1+(2*ROW()-1))
=INDEX(A:A,1+(2*ROW()-2))
To clarify, this is a "use in two columns other than the original" solution. I verified that it works on the two versions of Excel that I have immediately available (2000 and 2003).
 
I moved on to another project for now and will try there maybe Wednesday but to answer you Necronic, yes, I can use Access.
 

Necronic

Staff member
Ok, so here's what you do.

1) Import the spreadsheet to a table in access. Be careful not to let it use the first row as a field heading (assuming it's a number).
2) Select query design
3) Add the table to the query design
4) Select the field w/ the data (should be "field1" unless the import messed up). Double click it so it ends up on the
5) In the bottom there will be line marked "Criteria". Under the column that your field is in enter the following

Not Like "*:*"

Hit run, or view table (in the upper left corner of the ribbon)

6) Select all of the numbers given (select the column) and copy them into a column in excel

7) Now go back to access and the query and switch it back to design mode (where the view table thing was)

Change the Not Like line to this:

Like "*:*"

Hit run, or view table.

8) Select all of the numbers given (the column) and copy them into a column in excel.

------------

The order should be preserved in both cases. One issue with doing this within excel is that excel has problems dealing with datasets this large. I'm actually surprised it can handle 100k rows. Access will handle it easily and will do this in the blink of an eye.
 
Or, add another column to your current worksheet, formula:
=IF(ISNUMBER(FIND(":",A1)),CONCATENATE(":",A1),IF(ISNUMBER(FIND("*",A1)),CONCATENATE("*",A1),A1))

Drag it down your 100,000 rows. What it does is it adds the : or the * to the beginning of the number. Once that's done, copy those 100,000 rows into a new sheet, and paste the value. Sort the values and it'll break up into a group of * ####, : #### and then your real numbers at the bottom.

Of course it F's up your ordering, but your specs don't require that, so HAH.
 
Ah, Excel. Pretty much the introduction to my current interest in programming. I was always surprised at how robust the cell fuctions could be if you had the patience to find all the right keywords to "program" the cells.

Doing spreadsheets and pivot tables was my favorite part of being an analyst.

Jesus CHRIST, I'm a nerd.
 
Jay approves +30.

Thanks dude, this is awesome.

And for all others who attempted to assist, you earn all the following :


psIdL.jpg
 
I'd like to suggest that what you might really be after is just some sort of text manipulation.
I mean, from the example you gave, you could just export the whole thing as a txt file, then regex [A-z,0-9]\r[A-z,0-9] into [A-z,0-9],[A-z,0-9] (turn a plain newline into a comma) and colon/space/newline into just newline, which would give you the following:

J5678234,0944645
H812634,0906618
A467832,0906843

Then just import it as a CSV and you'd be all set. I'm no grepmeister, but it seems like it would be pretty straightforward.

--Patrick
 
Status
Not open for further replies.
Top