Tuesday, September 05, 2006

The Business of Blogs

Wow, considering the small spam attack we had here on Friday this article is timely reading.

When I started the EV Blog on Blogger I didn't know a whole lot about it, didn't know about the presence of blog spam, splogs, nothing. I just knew that I had heard about Blogspot before and it might be a good place to host a blog.

Now that I've been exposed directly, and now that my research shows how Blogger (and hence, Google) are part of the problem and not part of the answer... I'm toying with moving the blog over to Typepad. Here's my first draft.

It's got a couple of features that I like that Blogspot doesn't have, like the ability to have the "Read More..." links, categories, etc. Plus since it isn't a free service spam bloggers can't send bots out to open blogs for them. The signal-to-noise ratio should be pretty damn low over there.

So tell me what you think, either here or over there. I'll make this post on both sites and most likely will be posting over there henceforth.

Oh, and I promise real EV stuff coming on Wednesday! :)

Friday, September 01, 2006

Comments

Wow, two posts on a Friday. Well, I just had to moderate comments. If you want to know why check out the Friday Tech Tips for today, specifically the comments. Grrrr. Damn bottom feeders.

Late Friday Thoughts...

I don't normally post late on Fridays but this is something I wanted to comment on...

Mangle Those Cell Phones is an interesting blog entry by Richard Stiennon talking about other blog entries (notice how circular blogging has become? Wait, I'm doing it too!) where the basic gist is that a company released a study where they bought 10 used cell phones off eBay or someplace and WHOA, found personal info on those phones.

Why is no one looking at who this company is? (Note to self: Come up with a 'pulling my own hair out smilley") I know it's completely coincidental but *gasp* they happen to sell products to secure cell phones. Hello?

On a related note, today a group in Canada released a poll stating that Canadians overwhelmingly approved a blank-cd tax that goes to the artist and offsets piracy losses. The people who authored the poll? They're the one's who have a job collecting and disseminating that damn tax. Yeesh.

Tech Tip Friday: IndexCheck

Recent versions of EV have started shipping with utilities which is a great thing. Symantec believes (correctly) that the better we can support ourselves the better life is for everyone. The next few editions of TTF will concentrate on the currently shipping utils, and today we're going to start with the IndexCheck util.

IndexCheck: This util will perform a check on your indexes or index volumes. You can have it run a base check on the index or even crawl the entire index checking each word entry.

You will get massive warnings from the util about running it on live indexes. Heed these warnings and run it only on backup copies of the folders in question.

The basic command line parameters are listed in the docs so instead of regurgitating those here, I'll talk about each item.

-f     This param requires a path statement to the folder that you want to check. You will probably want to include an entire path statement, I'm not certain if virtual paths are accepted or not.

-c     This param states which tests you want to run. More on this in a minute.

-d     If you fire up dtrace in a separate command line and trace the IndexCheck service then include this command, your indexcheck will dump to dtrace along side whatever else you are dtracing. Useful if you think there's something wrong with the indexing engine. The dtrace output looks like this: link

-t     This sets the index tracing level. I always want to err on the side of too much data so I'll use this with '-t 6' to capture everything.

-ignorewarnings     Expert use only. Basically, it skips the 'Don't use this on live indexes' warnings at the start of the utilitie's run.

Now, the three tests you can run with the '-c' command. Here's some more detail and examples of each.

exists     This command merely runs through the index looking at basic things, a real quick once-over. Output from this check looks like this: link

words     This check will run through each and every word in the index, checking the entry. Output looks like this: link

docs     Checks each document in the index. Output will look like this: link

 

So, if you need to check the veracity of an index you could simply run through "indexcheck -f <path/folder> -t 6 indextrace.log" and let it assume the 'exist' check and dump everything to the "indextrace.log" file. This will run through the basic check, ensure that the files in the index folder are there and accurate.

If you still think there might be a problem you could run through all the docs that are in the index. That is going to be a little more thorough but take a little more time. "indexcheck -f <path/folder> -t 6 indextrace.log -c docs"

And then if you really want to ensure things are as ok as you possibly can, run this: "indexcheck -f <path/folder> -t 6 indextrace.log -c words" That will run through all the words in the index. Be warned that this will not be fast. A small index of only a couple megabytes of data took over 5 minutes to check 30,000+ words. The output linked above was snipped out of 250+M of trace log.

One more note: The documentation that ships mentions a parameter that checks if the avtrace.log file has any new entries. This flag is 'av <days>" and I couldn't get it to work. In fact if you go to a command line and enter indexcheck /? the online help makes no mention of this flag. I think it's another instance of incorrect documentation.

However, the logic still exists. If AltaVista has any issues it will dump trace info to the avtrace.log file. There's nothing really human-readable in there, but you are more interested in if there's ANYTHING in there at all. The file should be zero bytes long in a working index. Presence of any entries indicate possible issues, if the date on the entries coincides with errors in the app log then call support.