Choosing a Laptop – 2019

I have had many people over the years ask me what to look for when purchasing a good laptop.  That has changed over the years as we have seen a shift into multi-core computation and reliance on SSD technology.  So, here is a current run-down of buying tips (in order):

1. Buy just inside your budget, but do spend as much as you can for it, since that will probably make it last as long as it can for you.  Buying the cheapest means you’ll likely need to replace it earlier.
2. Memory:  If you are comparing multiple computers, at this point go with the one with the most RAM.  8GB is standard now for mid-level laptops, 16GB is even better.  2-4GB is doable for chromebooks and it works for a netbook, but it’s not good for doing anything “big” on the computer.  (For any real work I use my desktop or laptop with 16-32GB of RAM each).
3. Cores: The next thing to look at is the number of cores in the processor.  It goes hand in hand with the RAM, in that you want as many as possible.  In the worst case I’d trade off more RAM for fewer cores.  This information would likely be in the fine-print of the computer details, but it will say “x-core processor.” I’m not too worried about the brand (Intel or AMD) at this point.  Typical low-end laptops/chromebooks have 2 cores.  For longevity, I’d go for at least a 4-core if possible. (The best option now would be 4-core “hyperthreaded” which means it works like 8 cores. You will find this on most Intel processors.)
4. Type and speed of processor:  This is secondary to the number of cores, mostly.  Intel has the reputation of being the best, but AMD has a new line of processors that are supposedly awesome.  However, in general, AMD’s processors are cheaper to buy, giving you more options in the lower-cost machines that could potentially be “faster” than their Intel counterparts.  (For desktops, I buy AMD to get more cores and speed for the price).  That is to say, after RAM and number of cores, I’d pick an Intel i9, i7 or i5 line over the AMD chips (A-series processors), but I’d pick AMD’s A10, A8 over the Intel i3, Pentium, Celeron, or Atom models.  At this point, don’t buy a laptop with an ARM processor (that day is still coming).  Secondly, get the fastest processor of the best line you can (higher GHz).  Since the multi-core revolution, I’d say number of cores wins over speed of the core, since it allows the machine to do more at once, even slowly.
5. Hard drive: SSDs are faster. Period.  Do not get a laptop anymore with a rotational hard drive (HDD). SSDs are prevalent and the prices have come down significantly, so you should get a laptop with and SSD in the size you are looking for. 256GB would probably be the smallest I’d suggest, especially if you are using Windows.
6. Brand: Last but not least, get a brand you know.  Apple is known for its customer support, and I’ve had wonderful experiences with them. Dell, HP, and Lenovo seem to be the go-to PC brands, and they’ve been around and solid for a while.  Asus and Acer are also great brands.
7. Weigh the costs of getting an extended warranty (i.e., Square Trade, Geek Squad, etc). Many PC manufacturers will force you to pay shipping to have the laptop sent for inspection, then deciding whether it’s a free repair (but you still pay shipping) or a non-covered issue where you pay shipping and repair costs. Some warranties may be worth it, but I haven’t used them. Apple’s AppleCare+ has been worth it every time I’ve bought a Mac/iPad.

Quick Multi-File Rename in Linux

Call it the little things, but I’m usually excited when I find a new tool that speeds things up.  I accidentally ran a set of experiments that produced multiple files that were named incorrectly.  That is, I wrote files as aq*-female-1_day-*_event*.csv but I was actually supposed to write the files as aq*-female-edt-*_event*.csv.  These files fit into a larger set of results with varying time lengths (edt, month, year, etc), so I needed to change dozens of files to fix my mistake.

Cue the head scratching.  Do I rename each individually?  Do I write a bash or PHP script to perform the renaming, perhaps taking time away from other things? No! I found out there’s a tool for that built into the Linux command line environment: rename.

The rename command takes a Perl regular expression and a set of files and performs a rename on them based on the regular expression.  So, in this way, I could rename dozens of files with one command:

> rename 's/(.*)-1_day-(.*)/\1-edt-\2/' *.csv


Net Neutrality

Earlier today, I came up with this analogy for the internet and net neutrality:

Let’s say UPS owns I-95, FedEx owns I-64, and Joe’s Shipping owns I-295 and 288. With net neutrality, anyone could drive on any interstate, including UPS trucks on 64, without additional cost. There is some negotiation between companies for the interchanges between 64, 95, and 295.

Without net neutrality, it becomes much more problematic. FedEx could charge a fee for UPS trucks on 64, and vice versa. Joe’s Shipping is such a small company, that they may not be able to afford new charges to make deliveries using 64 or 95; therefore, they end up needing to use back roads which would affect delivery speed.  They would lose to the companies who could take the faster routes and ultimately can’t compete with UPS and FedEx speeds. So, they fold and sell 295 and 288 to the other companies.

Perhaps you own a store along 64, but depend on a supplier from the DC area for your products. If the supplier is a small company, you or them would have to add the additional cost of shipping fees from both shipping companies to use both roads to get the delivery to you (i.e. UPS delivery fee plus the FedEx fee that UPS pays to use 64). But, if the supplier is already a big player, like Amazon or Walmart, they will likely have a second warehouse off 64, so they can still offer the lower FedEx-only shipping charges.  Therefore, small suppliers can’t compete with already established large corporations.

And, what would be even worse: what if UPS and FedEx owned their own supply companies? Then perhaps you buy their products and shipping, because they charge anyone else extra to use either of their roads.

And that’s where we are today. Comcast and Verizon own large swaths of the internet and its interconnection, and they produce content (tv, movies, websites, etc). AT&T, which also owns portions of the internet, are trying to acquire Time Warner, including their production companies.

So, that should be terrifying. Even if they are transparent about how much they charge, it’s still not neutral. There aren’t enough back-channels to help all content get everywhere.

Now, I know you might be thinking “well, I pay for Ting, Google Fiber, [insert your good company here] internet, so they won’t play favorites with content.”  But, it’s not just about them; the internet is a very deep and complex network.  At its base is a backbone controlled by multiple different companies, some that you may have never heard of.   Your web content may pass through a few different companies on top of the one that you actually pay for internet access.  Without net neutrality, any one of them along the way has the ability to stop or slow your data or charge a fee.

There are a few things you can try to test out the internet for yourself and see what companies you’ll need to deal with to do rather mundane things online.  These are: traceroute and whois, and they’re freely available in Terminal (MacOS), the Linux, and I believe Windows’ Command Prompt.

Example Usage: My website

Let’s take a look at getting to my site, robbiehott.com, from my in-law’s house.  From the terminal, we will execute the command  traceroute robbiehott.com which will provide us with the following response:

traceroute to robbiehott.com (208.113.162.147), 100 hops max, 60 byte packets
1 gateway (10.0.0.1) 2.745 ms 3.198 ms 4.602 ms
2 96.120.18.205 (96.120.18.205) 25.460 ms 25.820 ms 26.449 ms
3 ge-3-1-sr01.palmyra.va.richmond.comcast.net (68.86.127.69) 25.772 ms 25.879 ms 27.830 ms
4 96.108.140.57 (96.108.140.57) 27.624 ms 27.535 ms 27.511 ms
5 ae-18-ar02.charlvilleco.va.richmond.comcast.net (68.86.173.213) 34.516 ms 34.505 ms 34.451 ms
6 be-21508-cr02.ashburn.va.ibone.comcast.net (68.86.91.53) 36.430 ms 20.432 ms 24.491 ms
7 hu-0-11-0-3-pe04.ashburn.va.ibone.comcast.net (68.86.88.78) 27.858 ms 27.145 ms 27.138 ms
8 50.242.151.190 (50.242.151.190) 33.523 ms 33.494 ms 32.625 ms
9 207.88.14.164.ptr.us.xo.net (207.88.14.164) 33.410 ms 36.523 ms 38.295 ms
10 207.88.14.181.ptr.us.xo.net (207.88.14.181) 37.674 ms 36.776 ms 39.484 ms
11 209.48.43.58 (209.48.43.58) 39.854 ms 40.663 ms 40.865 ms
12 ip-208-113-156-4.dreamhost.com (208.113.156.4) 26.594 ms 27.676 ms 28.477 ms
13 ip-208-113-156-14.dreamhost.com (208.113.156.14) 22.865 ms 23.073 ms 23.237 ms


This list shows all the steps between my laptop and my website.  You’ll notice it’s backwards; that is, these are the step to my website.  However, the website data will take roughly the same path back to my laptop.  Let’s unpack this a little:

• Step 1 is the gateway, i.e. the router in the house that my laptop connects to on wifi.  If your first entry starts with 10 or 192.168, then that is a local network and likely your router.
• Steps 2-8 are all routers or computers at Comcast.  Steps 3 and 5-7 specifically tell us that they are comcast.net, and we see my request going from Palmyra to Charlottesville to Ashburn.
• Steps 9-11 are all routers or computers at  MCI Communications (remember them?  well, they’re actually Verizon now).  They don’t advertise that fact here, but I’ll show you how to get that information in a minute.
• Steps 12-13 are computers at DreamHost, where my website resides.

How do I know that step 9 is Verizon?  Our second command will give us that information: by typing whois 207.88.14.164 into our terminal, we get a response from a registrar that details the owner of that particular address.  In this case, the important part is:

Organization: MCI Communications Services, Inc. d/b/a Verizon Business (MCICS)


In an age without net neutrality, my site could be slowed down by either Comcast or Verizon, even though my website is hosted at DreamHost.  You’ll see images of “plans” that speculate paying extra for the “news websites” package or the “streaming video sites” package, but the actual case is more complicated than that.  My in-laws could pay Comcast extra for the “personal websites” package, but that won’t affect Verizon’s handling of my website data.

This is a simple example because it is likely that DreamHost pays Verizon for internet access and my in-laws pay Comcast, but there are cases in which the internet traffic will pass through an intermediary company.  I encourage you to go forth and test this out.  You’ll find companies like Fox News that pay a company called Akamai, which provides those “warehouses” from my analogy–places on your network that may use only your internet provider to deliver faster responses.  You’ll see companies like Level3 that you may have never heard of.

When you’re done, and you’re convinced something needs to be done, there are a few things you can do to try to influence what’s happening at the FCC:

1. Call your representatives in Congress and ask them to support net neutrality.  (Don’t email, call.  Someone has to take your call.)
2. Comment with the FCC.  They are supposed to take these into account when making the decision.
3. Vote in 2018.

Quick Bar-Chart of disk usage

Today I was in search of a command that I had used a long time ago, but ran into a much more interesting one instead.  At the time, I must have been needing to discover what files were the largest disk hogs and if there was a long tail (i.e. how many of the 3.7M files in this directory–not my fault, by the way–were inconsequential).  That brings us to this wonderful “one-line” command:

find /dir/ -name "*.xml" -exec du -s {} ; | perl -ni -e 'if (/^(d+)s+(.*)/) { $h{$2} = $1; if ($max < $1) {$max = $1; } if (length($2) > $maxfname) {$maxfname = length($2); } } END { map {$barlen = ($h{$_} / $max) * 50;$bar = "*" x $barlen; printf ("%" .$maxfname . "s" . "(%5d): %s", $_,$h{$_},$bar); print "n"; } sort { $h{$b} <=> $h{$a} } keys %h }' 2> /dev/null > report.txt


What that specifically does is to find every XML file in the dir directory, use the linux du command to get the file’s size.  That list of filenames and sizes is passed to a hacky perl script that pulls out the size, creates a horizontal histogram bar based on the max size (limit 50 *s wide), sort and return the list from max to min.  Lastly, that’s saved to report.txt.

That’s quite a quick and dirty trick, but produces a nice command-line output like this:

/dir/w6bz9whg.xml(36560): **************************************************
/dir/w6km312r.xml(31772): *******************************************
/dir/w68d03gz.xml(27728): *************************************
/dir/w6vt5fhv.xml(27076): *************************************
/dir/w6m07v80.xml(17420): ***********************
/dir/w68m0zj8.xml(15276): ********************
/dir/w6mq7qpz.xml(15052): ********************
/dir/w6vq30tq.xml(13808): ******************
/dir/w6tb51hr.xml(13160): *****************
...


Choosing a Laptop

I’ve had many people over the years ask me what to look for when purchasing a good laptop.  That has changed over the years as we have seen a shift into multi-core computation and reliance on SSD technology.  So, here is a current run-down of buying tips (in order):

1. Buy just inside your budget, but do spend as much as you can for it, since that will probably make it last as long as it can for you.  Buying the cheapest means you’ll likely need to replace it earlier.
2. Memory:  If you are comparing multiple computers, at this point go with the one with the most RAM.  4GB is standard now for mid-level laptops, 8GB is even better.  2GB is doable for chromebooks and it works for a netbook, but it’s not good for doing anything “big” on the computer.  (For any real work I use my desktop or laptop with 8GB of RAM each).
3. Cores: The next thing to look at is the number of cores in the processor.  It goes hand in hand with the RAM, in that you want as many as possible.  In the worst case I’d trade off more RAM for fewer cores.  This information would likely be in the fine-print of the computer details, but it will say “x-core processor.” I’m not too worried about the brand (Intel or AMD) at this point.  Typical low-end laptops have 2 cores.  For longevity, I’d go for at least a 4-core if possible.
4. Type and speed of processor:  This is secondary to the number of cores, mostly.  Intel has the reputation of being the best, followed by AMD.  However, AMD’s processors are cheaper to buy, giving you more options in the lower-cost machines that could potentially be “faster” than their Intel counterparts.  (For desktops, I buy AMD to get more cores and speed for the price).  That is to say, after RAM and number of cores, I’d pick an Intel i7 or i5 line over the AMD chips (A-series processors), but I’d pick AMD’s A10,A8 over the Intel i3, Pentium, Celeron, or Atom models.  At this point, don’t buy a laptop with an ARM processor (that day will come soon).  Secondly, get the fastest processor of the best line you can (higher GHz).  Since the multi-core revolution, I’d say number of cores wins over speed of the core, since it allows the machine to do more at once, even slowly.
5. Hard drive: SSDs are faster. Period.  However, they’re expensive and therefore smaller in size.  My netbook has a 32GB SSD hard drive, and it’s been full for months.  So, I can’t do much with it.  If you want to store music, documents, and a lot more, get one with a rotational HDD.  It may be a little slower, but it can store a lot more.  Plus, for a cheaper cost down the road, you can replace the HD with an SSD (about a \$100 upgrade).
6. Brand: Last but not least, get a brand you know.  Dell and Lenovo seem to be the go-to PC brands, and they’ve been around and solid for a while.  Asus and Acer are also great brands.  I’d personally stay away from HP for now and very off-brands.

Hardness and Political Choices

Right image: 2012 Election Results (1), Left image: Hardest Places to Live in the US (2).

A few weeks ago, the New York Times posted a great article on the hardest places to live in the United States, based on education, median income, unemployment rate, disability rate, and a few other factors.  It is an incredible article, and I recommend reading it at nytimes.com.  As soon as I saw their graphic, I immediately wondered if there was a connection between political persuasion and hardness.  To look at this, I grabbed Mark Newman’s version of the 2012 election results and a nice image comparator so that the two maps can be compared on top of each other.  The results are interesting!

For clarification: On the election results (left), each district is colored on a gradient from blue to red based on percentage of the vote for the winning candidate (purple would mean an even split Obama/Romney).  For the NYTimes hardness results (right), dzisctrics are colored on a gradient orange to green, where orange is worse (harder to live) and green is better.

What to make?  I don’t know.  In the north-east (areas including New England, Kentucky, Michigan, Illinois, and parts of Virginia), it appears that the more liberal areas are usually the easier places to live, and the harder places to live are usually more conservative.  However, in the mid-west (the entire middle of the country west to California), it appears to be just the opposite.  In any case, it’s interesting to think about!

References

1.  Newman, Mark. “Maps of the 2012 US Presidential Election Results.” N.p., 8 Nov. 2012. Web. 14 Oct. 2014. <http://www-personal.umich.edu/~mejn/election/2012/>.

2.  Flippen, Alan. “Where Are the Hardest Places to Live in the U.S.?” The New York Times. The New York Times, 25 June 2014. Web. 14 Oct. 2014. <http://www.nytimes.com/2014/06/26/upshot/where-are-the-hardest-places-to-live-in-the-us.html>

VI Tricks

I may be stuck in the past, or like punishment, but my editor of choice is still VIM.  However, certain tricks seem to be hard to find on Google searches, so I’m going to compile them here:

• Creating custom commands and keyboard mappings are easy in VIM.  To create a custom command, list the command in the .vimrc file.  The % character includes the current buffer’s filename in the shell command.
command CommandName execute "!shellcommand %"
This command can be run in VIM using the standard :CommandName convention. To map this new command to a keyboard shortcut, use the map command in the .vimrc file.
map <F5> :CommandName<CR>

Command Line Tricks

So, I always am using some command line shortcuts to do various tasks, and often have to look up the tricks every time I need to do something remotely fancy.  Here are some of my most-used helpful hints:

• To remove the leading spaces and tabs from each line of text on standard in (so use with a pipe for the input), this sed command will work well:
sed -e 's/^[ \t]*//'
• Reformatting XML/HTML files so that line returns inside tags are removed:
xmllint --format --noblanks infile.xml > outfile.xml

Boots: New Machine Learning Approaches to Modeling Dynamical Systems

Large streams of data, mostly unlabeled.

Machine learning approach to fit models to data. How does it work? Take the raw data, hypothesize a model, use a learning algorithm to get the model parameters to match the data.

What makes a good machine learning algorithm?

• Performance guarantees: $$\theta \approx \theta^*$$ (statistical consistency and finite sample bounds)
• Real-world sensors, data, resources (high-dimensional, large-scale, …)

For many types of dynamical systems, learning is provably intractable. You must choose the right class of model, or else all bets are off!

Look into:

• Spectral Learning approaches to machine learning

Basener: Topological and Bayesian Methods in Data Science

• Topology: Encompasses the global shape of the data, and the relations between data points or groups within the global structure