From dcj@rkba.lkg.dec.com Thu Feb 2 06:59:24 1995 Received: from nova.unix.portal.com (root@nova.unix.portal.com [156.151.1.101]) by jobe.shell.portal.com (8.6.9/8.6.5) with ESMTP id GAA26939 for ; Thu, 2 Feb 1995 06:59:24 -0800 Received: from inet-gw-3.pa.dec.com (inet-gw-3.pa.dec.com [16.1.0.33]) by nova.unix.portal.com (8.6.9/8.6.5) with SMTP id GAA06496 for ; Thu, 2 Feb 1995 06:58:52 -0800 Received: from muggsy.lkg.dec.com by inet-gw-3.pa.dec.com (5.65/10Aug94) id AA03369; Thu, 2 Feb 95 06:21:37 -0800 Received: from cmtsrv.lkg.dec.com by muggsy.lkg.dec.com (5.65/DEC-Ultrix/4.3) with SMTP id AA14744; Thu, 2 Feb 1995 09:16:44 -0500 Received: from rkba.lkg.dec.com by cmtsrv.lkg.dec.com; (5.65/1.1.8.2/18Jan95-0400PM) id AA04996; Thu, 2 Feb 1995 09:20:47 -0500 Received: from localhost by rkba.lkg.dec.com; (5.65/1.1.8.2/06Jan95-0252PM) id AA01076; Thu, 2 Feb 1995 09:20:48 -0500 Message-Id: <9502021420.AA01076@rkba.lkg.dec.com> To: chan@shell.portal.com Cc: tat@well.sf.ca.us Subject: another script Date: Thu, 02 Feb 95 09:20:47 -0500 From: "Dennis C. Josifovich" X-Mts: smtp Status: RO The following script is not optimal, but it gets most of the job done. It does NOT fold in the judiciary committee information. I'm assuming you have a standard unix machine to run this on AND that I used only commands in a portable way (I may have goofed and did something you machine can not do, hope not.). There are two passes. First pass extracts all the comments to places them at the beginning of the file. The second pass extracts the data and slightly processes it. One day, I'll figure out how to easily add committee info, ... to the existing congress.info file. (I know, I know, you're going to call it congress.txt). One note, there should be a space followed by a hard tab (^I) in 's/[ ]*$//g' please make sure this is still the case when you get it and save to a file. process-tat-files: cat HOUSEALL.TXT SENATALL.TXT \ | tr -d '\015\032' \ | awk '/^#/ {print $0;}' \ | sed 's/[ ]*$//g' \ > congress.info cat HOUSEALL.TXT SENATALL.TXT \ | tr -d '\015\032' \ | awk -F, '!/^#/ && NF > 0 {print $0;}' \ | sed -e 's/"//g' -e 's/[ ]*$//g' \ | sort -t , -u -f \ >> congress.info