[ Ø ] Harsh Prakash

Quiet Musings on Cloud, Machine Learning, Big Data, Health, Disaster, et al.

Archive for the ‘aws’ tag

Sunsetting Python 2

without comments

Written by Harsh

December 10th, 2019 at 6:01 pm

Posted in Cloud,Programming

Tagged with ,

SM Automation and Service Catalog: When to use what

without comments

Systems Manager Automation can be utilized in cases where an Automation workflow needs to be built for maintenance and deployment tasks. Pre-defined Automation Documents can be used, or an Automation Document can be created for the desired task to define the actions that Systems Manager should perform on the specified resources. The Automation Document includes one or more steps which run in a sequential order, and each step is
associated with a particular action that you specify.

E.g. You can have one step to launch an instance, and the next step in the Automation Document to do some actions on the newly created instance.

See user guides for Systems Manager Automation and Automation Document.

On the other hand, Service Catalog lets sysadmins manage a catalog of infrastructure Products, and organize them into Portfolios to which the end users can be granted access to. This way, your fav sysadmin can control the list of Products that the end users are allowed to deploy on AWS.

See sorting and FAQs.

When should I use Systems Manager Automation to launch an instance v. use Service Catalog to launch an instance?

Well, it depends on your individual use case.

Systems Manager Automation enables you to automate the administrative or management tasks performed on instances and other AWS resources. This helps you to simplify complex task by performing the desired tasks on a large groups of instances.

Service Catalog is used to group your resources in CloudFormation templates, called Products, by IAM groups, users or roles. You can group and administrate those products in Portfolios which can be shared with your accounts. Service Catalog also enables you to have an abstraction layer, where even if your end users do not have IAM permissions to create
EC2 instances, for example, but if they have access to a Service Catalog Product that creates EC2 instances, then the users can create instances via provisioning the Product. This allows Administrators to provision applications for end users by setting configurations within
Portfolios.

Pricing

For Systems Manager, you pay only for what you use and are charged based on the number and types of steps.

In Service Catalog, you pay a fixed fee of $5 per month for each portfolio of Products with assigned users.

In Systems Manager Automation, there are a few predefined Automation Documents which can be utilized for the tasks you would like to perform on your resources. If there is no predefined Automation Document available for the specific task that you would like to perform, you can build your custom Automation Document utilizing the Automation Actions.

In Service Catalog, you would have to create a custom CloudFormation template for each task you would like to perform and create the products in your Portfolio.

Differences between Systems Manager Automation Document and Service Catalog Template

For a few sample codes for Actions that can be specified in a Systems Manager Automation
Document, see this user guide.

You can view the predefined Automation Documents from your Console by following the
steps below:
1. Open Systems Manager Console.
2. Choose Documents from the navigation pane on the left.
3. In the search filter, choose Document Type and Automation as the value.
4. From the list of Automation Documents displayed, click on a Document to open
the details page.
5. Choose the Content tab to view the Document content.

For a number of sample CloudFormation templates for Service Catalog, see these user guides – 01 and 02.

How will I use Systems Manager Automation and Service Catalog for each of the tasks below i.e. what is best suited for each of the following tasks. E.g. Launching a new instance – Patch an existing instance – Terminating an instance – Scheduling start and stop of an instance.

If you have a use case where an Administrator would like to grant the other IAM users who would be performing this tasks limited access to these services, then you can utilize Service Catalog to create a Portfolio and only allow certain IAM users/roles to use it to deploy only the Products configured in that Portfolio.

Otherwise, if you would only like to automate these tasks, Systems Manager Automation would be more suitable for performing the above tasks.

E.g. 1. Launching a new instance: To launch a new instance using Automation, you can use the action ‘aws:runInstances’ in your custom Automation Document to launch a new instance. See the user guide for run instance.

E.g. 2. Patching an existing instance: There is a predefined Automation Document which can be utilized. See the user guide for patch instance.

E.g. 3. Terminating an instance: For a predefined Automation Document that can be used for terminate EC2 instances, see the user guide for terminate instance.

E.g. 4. Schedule a start and stop of an instance: There are predefined Automation Documents for Starting and Stopping EC2 instances which can be scheduled to run at the desired time using a Maintenance window. See these user guides – 01 and 02.

If you would like to perform all these tasks in an automated way on a schedule, you can utilize Systems Manager Maintenance window to achieve this. See the user guide for maintenance.

For Service Catalog, you would have to build your own CloudFormation template to perform each task. This template can be provided while creating a Product in your Portfolio. See these user guides – 01 and 02.

Written by Harsh

December 4th, 2019 at 6:11 pm

Posted in Cloud

Tagged with , ,

Aamees

without comments

This was announced today, so some guidance on what should be the preferred way(s) to create and share AMIs, going forward:

Via Cloud management like CloudTamer
You can share AMIs (and Service Catalog Portfolios) in CloudTamer. Specifically, you can create AMIs under “Cloud Management” > “AWS AMIs”. Note, a share is fully managed by CloudTamer so it’ll only be shared with AWS accounts that have a CloudRule containing it, and all other AWS accounts will lose access to it if it is shared with them separately. Once we have the permission to share these at the Project or OU level, AMIs won’t need to be re-shared with new accounts as they are on-boarded.

Via Service Catalog
I created a product for this in July – It creates an AMI from an EC2 instance (not just create an EC2 instance from an accessible AMI) by implementing a Custom Resource [Custom::AMI] that calls the CreateImage API on Create (and calls the DeregisterImage and DeleteSnapshot APIs on Delete). Custom Resources enable writing custom Lambda in templates.
Basically, it starts off with a base image to create an EC2, then patches it, creates an image out of it, and deletes the EC2 when all is done – Essentially, automating what most Cloud shops do anyway. With this approach, you can enable your customers to mint a new AMI on-demand. Wee!

Via AMI Factory/SM Automation/Resource Access Manager (RAM)/EC2 Image Builder (from re:Invent 2019)

Via the traditional workflow
You can share AMIs by adding new account numbers to the share. This approach should be scripted.

PS: Should add that AMI IDs can be “hidden” in Parameter Store $vars for all of the above.

Written by Harsh

December 2nd, 2019 at 6:06 pm

Posted in Cloud

Tagged with ,

From SSH to SSM (a.k.a. the demise of bastion host)

without comments

To avoid having to manage very many SSH keys, sysadmins used ssh-user. AWS solves the headache with ssm-user. It is only for administering existing infrastructure. To provision new infrastructure, as should be end-goal in a Cloud shop anyway (think cattle, not pet – Shitty way to look at animals, but anyway…), you don’t need either SSM or SSH.

Traditionally, you find “what” someone does, like so –

$ sudo cat secure | tail -n 3
Aug 6 10:40:16 hostname sshd[12345]: pam_unix(sshd:session): session opened for user username by (uid=0)
Aug 6 10:40:40 hostname sudo: username : TTY=pts/3 ; PWD=/var/log ; USER=ssh-user 02 ; COMMAND=/usr/bin/bash
Aug 6 10:40:56 hsotname sudo: ssh-user : TTY=pts/3 ; PWD=/var/log ; USER=root ; COMMAND=/bin/cat secure

For tracking, the one change that can be implemented a-stat is to require all teams to start using TOKENs and not keys for GitHub check-ins so traceability of their provisioning code can be maintained.

With these, if teams need to login to existing sys, you can trace who all logged in, who all invoked ssh-user, and what ssh-userdid. So, you can narrow down a misstep to the group that last invoked ssh-user, but not to a specific individual. (Note, while SSM stills logs everyone in as ssm-user, it hyphenates sessions separately by user and can narrow it down to an individual).

Another workaround you can put in place is to manage keys via .ssh/config, at least in the interim. But eventually, you should move to SSM in an AWS shop.

And I’d rather have teams working in an IDE from their local laptops than in a text editor on a server instance (admit it, VI or EMACS isn’t everyone’s cup of tea). For that, since we’d need Linux running on Windows for Ansible etc., take your pick between WSL or VirtualBox. Either way should be a-okay – In the past, I used WSL for access (with temp keys via STS), and found a sub-system less bloated than a full-fledged sys box. And there was this sweet thing.

This is imp – Esp. as we treat infrastructure = code and engineers/sysadmins = developers/builders, we should allow engineers/sysadmins the same DevOps tools – an IDE to work in locally, etc. So that’d mean that each engineer/sysadmin forks her own repo and pushes changes out in true Git fashion (with retention/archiving handled by versioning).

With that, you pretty much kill the need for most intermediate server hosts.

Written by Harsh

August 6th, 2019 at 3:30 pm

Posted in Cloud

Tagged with , ,

Vault

without comments

Ansible-vault can be in any format, just as long as you can retrieve from it –
If YML (recommended), it’s easier to parse – msg: "{{ key }}"
If JSON, then you’ll have to parse it like so – msg: "{{ vault_var['key'] }}"

For Ansible older than version 2.4 | Version 2.4 or newer | Description

* --ask-vault-pass | --vault-id @prompt | It’ll prompt for your vault password when you run the playbook
* --vault-password-file | --vault-id file.yml | It’ll look for your vault password in file.yml

See SOP.

Written by Harsh

August 5th, 2019 at 2:33 am

Posted in Cloud

Tagged with ,

Giving ML the RT

without comments

So, how does Machine Learning fare against the Rorschach Test?

  Rorschach Blot     Popular Responses     Labels Found by AWS Rekognition  
bat, butterfly, moth, Stain, Weaponry, Weapon,
humans, animal, Art, Modern Art, Heart, Paint Container,
human, Art, Animal, Bird, Modern Art, Drawing, Painting, Leisure Activities, Dance, Dance Pose, Stain,
animal hide, skin, rug, Art,
bat, butterfly, moth, Plant, Leaf, Silhouette, Symbol, Dog, Mammal, Canine, Pet, Animal, Bird, Arrow,
animal hide, skin, rug, Arrowhead, Bird, Animal,
human, head, Nature, Outdoors, Animal, Bird, Chicken, Poultry, Fowl, Weather, Ground, Stain,
animal, Stain, Art, Painting,
human, Stain, Painting, Art, Modern Art, Canvas, Graphics, Paint Container, Advertisement, Poster, Plot,
crab, spider, Art, Painting, Graphics, Modern Art, Pattern,

▲ Applied ML (Machine Learning)

Related:
* “MIT researchers: Amazon’s Rekognition shows gender and ethnic bias (updated)”
* Solar Dynamics Observatory (SDO)
* Global MapAid

Written by Harsh

July 10th, 2019 at 7:10 pm

In Retrospect: Lessons & Tips from a Large Federal Implementation

without comments

Last year, I presented on “Esri WebGIS Platform – How we implemented ArcGIS, and you can too” at FedGIS. This year, I shared another summary – lessons and tips from that implementation. That is especially helpful if you are dealing with the unique security responsibilities of the federal government around high-value PII/PHI-based data assets and Expedited Life Cycle (XLC) processes.

From a technical perspective, I shared how we implemented a hybrid and disconnected ArcGIS design inside a 3-zone architecture with multi-VPN and multi-NIC networks on Red Hat Enterprise Linux.

From a high-level management perspective, I shared how that played out inside the federal environment.

Esri ArcGIS Federal

Written by Harsh

March 26th, 2018 at 9:10 pm

Esri WebGIS Platform

without comments

Customer Need, Deployment Option, Authority to Operate (ATO), Challenges, Solutions, Lessons

Esri WebGIS Platform

Written by Harsh

September 13th, 2017 at 4:11 pm

Esri in AWS Cloud

without comments

Background, Options, Opportunities

Esri in AWS Cloud

Written by Harsh

December 17th, 2016 at 11:48 pm

Geodata Based Decisions

without comments

How to Use Location Analytics to Improve the Effectiveness of Public-Facing Sites

Geodata Based Decisions

Written by Harsh

March 17th, 2016 at 11:47 pm

Posted in GIS,Health,Management,Technology

Tagged with , ,

HowTo: Run ‘ArcGIS for Server Advanced Enterprise’ (10.3.1) on Amazon EC2 Red Hat Enterprise Linux (7)

without comments

The talks on ArcGIS Server at ESRI Health GIS were fun, but I wanted more – specifically, to install and administer its latest release on Amazon Web Services, all via the trusted command line. Here’s how I did that:

To follow along, get an EDN license and an AWS account. Especially, if you have been in the industry for long, there’s no good excuse to not have those with the biggest companies in GIS and da Cloud (and while you are at it, get MapBox and CartoDB accounts too).


### Setup the stage ###
# Downloaded its AWS key from //aws.amazon.com/console/ and connected to my instance (ensured it matched the min. system requirements) using its public DNS (if you restart your instance, this will change). Note I SSHed using Cygwin instead of PuTTy.
$ ssh -i "key.pem" ec2-user@#.#.#.#.compute.amazonaws.com
$ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.1 (Maipo) # Even though I used RHEL-7.0_HVM_GA-20141017-x86_64-1-Hourly2-GP2 by Red Hat (I later found out that ESRI provides its own AMI)
$ sudo yum upgrade
$ sudo yum update
$ sudo yum install emacs # For that college-dorm smell, no offense Nano/Vi
$ sudo emacs ~/.bashrc
    force_color_prompt=yes # If you haven't already... (Ignored the embedded rant and uncommented this line to make the prompt colored so it was easier to read in-between)

### Setup the instance ###
# I used a M4.LARGE instance with a 20GB EBS volume (in the same Availability Zone, of course) - ensured it didn't go away if I were to terminate the instance. Then, I extended the partition to exceed the min. space requirements (took a snapshot first) - unfortunately, AWS docs didn't help much with that.
$ df -h
> ...
$ lsblk # Listed block partitions attached to the device. Since there was a gap in sizes between the partition and the device (and there were no other partitions), I resized the child partition "XVDA2" (the root file system where I would finally install ArcGIS Server) to use up the surplus space on its parent disk "XVDA".
> NAME SIZE TYPE MOUNTPOINT
> xvda 20G disk
> |_xvda2 6G part /
# First, updated its metadata in the partition table
$ sudo yum install gdisk # Since disk label was GPT
$ sudo gdisk /dev/xvda/
$     print # Noted the start sector
$     delete
$     new
$     #### # Used the same start sector so that data is preserved
$     \r # For the max. last sector
$     # # Used the same partition code
$     print
$     write
$     y
# Next, updated the actual XFS file system
$ sudo xfs_growfs / # This is the actual change for XFS. If 'df -T' reveals the older EXT4, use 'resize2fs'.
# Then, confirmed to see if the boot sector was present so that stop-start will work
$ sudo file -s /dev/xvda # Bootloader
# Finally, rebooted the instance to reflect the new size
$ sudo reboot

### Onto GIStuff ###
# WinSCPed and untarred the fresh-off-the-press 1GB release
$ tar -xvf ArcGIS_for_server_linux_1031_145870.gz
# Got the right ECP#########?
$ ./Setup # Started headless installation - try "--verbose" if you run into other issues
# Hit a diagnostics roadblock: File handle limits for the install user were required to be set to 65535 and the number of processes limits to 25059. So...
$ sudo emacs /etc/security/limits.conf
$     ec2-user soft nofile 65535
$     ec2-user hard nofile 65535
$     ec2-user soft nproc 25059
$     ec2-user hard nproc 25059
# Logged out, logged back in, verified
$ ulimit -Hn -Hu
$ ulimit -Sn -Su
$ ./Setup

### Authorize, authorize, authorize! ###
# Created and uploaded authorization.txt, and downloaded authorization.ecp from //my.esri.com/ -> "My Organization" -> "Licensing" -> "Secure Site Operations"
$ locate -i authorization.ecp
$ readlink -f authorization.ecp
$ ./authorizeSoftware -f /path/authorization.ecp
$ ./authorizeSoftware -s # s=status, not silent
$ ./startserver.sh
$ netstat -lnp | grep "6080" # Confirmed owned processes - that it was listening on the default TCP@6080 (port is only required if you don't have the Web Adapter)
# Ensured IP and domain were listed correctly in the hosts file (e.g. Single IP may be mapped to multiple hosts, both IPv4 and IPv6 may be mapped to a single host, etc.)
$ hostname
$ emacs /etc/hosts
$     127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
$     ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
$     #.#.#.# localhost localhost.localdomain localhost4 localhost4.localdomain4
# But wait, before I could browse to my site from a public browser, I needed to add this Inbound Rule to the Security Group attached to the instance
Custom TCP rule TCP 6080 0.0.0.0/0

### Browser ahoy! ###
//#.#.#.# or machinename:6080/arcgis/manager
ArcGIS Server Setup Wizard -> Create New Site
Primary Site Administrator -> Create Account # Stored with the site, not the OS
# Must be local and accessible from every machine in your site
    Root Server Directory: /home/username/arcgis/server/usr/directories # To store output images, etc.
    Configuration Store: /home/username/arcgis/server/usr/config-store # To hold info about the server's machines, services, directories, etc.
# This is when I ran into "0x80040154 - Could not create object 'ConfigurationFactory'". So, went digging through the logs...
$ cat /home/ec2-user/arcgis/server/usr/logs/EC2/server/server-...log
> ...
> Cluster 'default' successfully created.
> Failed to create the site. com.esri.arcgis.discovery.servicelib.AGSException: java.lang.Exception: AutomationException: 0x80040154 - Could not create object 'ConfigurationFactory'.
> Disconnecting the site from the configuration store.
...
# Back to the server: File/directory permission issue? Nope. The issue turned out to be missing packages, even though the pre-installation dependencies check had passed. All 15 listed below:
$ sudo yum list installed
$ sudo yum install wget
$ wget http://vault.centos.org/6.2/os/x86_64/Packages/xorg-x11-server-Xvfb-1.10.4-6.el6.x86_64.rpm
$ sudo yum localinstall xorg-x11-server-Xvfb-1.10.4-6.el6.x86_64.rpm
sudo yum install Xvfb # Else "Unable to start Xvfb on any port in the range 6600-6619"
sudo yum install freetype
sudo yum install fontconfig
sudo yum install mesa-libGL
sudo yum install mesa-libGLU
sudo yum install redhat-lsb
sudo yum install glibc
sudo yum install libXtst
sudo yum install libXext
sudo yum install libX11
sudo yum install libXi
sudo yum install libXdmcp
sudo yum install libXrender
sudo yum install libXau
# Cleanliness is next to godliness, or so my Catholic school nuns would say
$ sudo yum clean all
$ cd /tmp/
$ sudo rm -r *
$ logout

### Back to the browser ###
//#.#.#.#:6080/arcgis/manager/
# At the end, added SSL using a self-signed certificate
//#.#.#.#:6080/arcgis/admin/
Custom TCP rule TCP 6443 0.0.0.0/0 # Added this rule to the group on AWS first

### Uninstall? ###
$ ./stopserver.sh
$ ./uninstall_ArcGISServer
# rm folders after done

Conclusion: 6443 or 8443?

After years of doing this with first ESRI (PROD), then MapServer (PROD) and GeoServer (DEV), I went back to the dark ahem ESRI side. And what do I keep finding? That the big two are blending together in terms of looks. E.g. The console of the other Java-powered mapping server, GeoServer, is looking similar to that of its big brother on-steroids. The third, MapServer, somewhat paradoxically on the other hand, has both come a long way (MapCache and ScribeUI, yay!) and still lost ground.

Next up, testing Tippecanoe.

PS:
* I tried both 10.3.1 and 10.0 on Ubuntu (15.04), unsupported. While both installed, site creation didn’t work because of missing packages – searching through apt-cache didn’t help either. On Windows, there is always their CloudBuilder.

Related:
* GeoNet
* Landsat on AWS in ArcGIS

Written by Harsh

September 28th, 2015 at 7:43 pm

#HealthGIS: Notable links and final thoughts on the conference

without comments

Health websites using ESRI ++

    * With ArcGIS JavaScript
        • CDC’s Division for Heart Disease and Stroke Prevention (DHDSP) Atlas

    * With ArcGIS Server / ArcGIS Online (via Apache Flex)
        • HealthLandscape’s Accountable Care Organization (ACO) Explorer
        • Dartmouth’s Atlas (try generate KML)
        • NMQF’s Methicillin-resistant Staphylococcus aureus (MRSA) mapping
        • HRSA’s Datawarehouse

Health websites whose global participants have trouble with software licenses ++

    * With OpenLayers and DHIS2 (~ an opensource InstantAtlas)
        • PEPFAR’s (a president’s best legacy) Data for Accountability, Transparency and Impact (DATIM) – coming soon to GeoServer + MapLoom and OpenLayers

    * Even Highmaps(!)
        • NCHS’s Health Indicators Warehouse (HIW)

    * Many More…

Dots

Clearly, there’s no shortage of health data or technologies, esp. following ACA’s requirements of uniform data collection standards, just a continuing kerfuffle with overlaying disparate JSON/OGC tiles from their many data owners and manifold service endpoints. Unfortunately, only part of this problem is technical. Take Flu mapping, for instance. CDC, WHO, WebMD (with MapBox) and Google, even Walgreens does it. Or take HIV mapping where you can choose from CDC and NMQF, among others. Even anonymized private claims data is available for a couple of Ks a month. I think a bigger part of the problem is the misalignment between vendors’ business interests and mandates of various agencies and goals of the health research community at large.

Connect

At some point, researchers and epidemiologists would want to see how these data tiles correlate to each other. And GIS professionals would want a quicker way to ‘overlay this layer’ with out having to dig through Firebug. And compress it over the wire, while you are at it (when our users in remote Africa were asked to switch off their smartphones to view desktop maps, we understood data compression a little differently).

Crunch

And then they would want to analyze them, be it on the server with Big Data or in the client with smaller ones. On analyses, your favorite GIS continues to take heat from tools like Tableau among conference attendees.




Mapping Visible Human with Deep Zoom (It’s Interactive!)

Overall, a growing use of ArcGIS Server’s publisher functionalities and a compelling body of story map templates leveraging its narrative text capabilities. E.g. Atlas for Geographic Variation within Medicare. On publishing, I suspect some researchers would like to see a Mapbox plugin for QGIS. Yes, you can render and uploads maps from TileMill to your Mapbox account, but CartoDB has QgisCartoDB where you can view, create, edit or delete data from QGIS to your CartoDB account (I needn’t add that Python-powered QGIS remains a favorite among matplotlib-loving researchers).

health.w800PS: My ranking of how easy it is to connect to federal health datasets –
1. CDC (E.g. NCHS, Wonder, Health Indicators)
2. CMS (E.g. DNAV, Medicare – try Hospital Compare – Info, Spreadsheet, JSON)
3. HRSA (E.g. Datawarehouse).

Related:
* CDC’s GIS Resources
* CDC’s Submit Maps
* Hospital Referral Region (HRR) – A regional market area for tertiary medical care
* Health Savings Account (HSA) – A tax-advantaged medical savings account available to some taxpayers

++ While log analyses attest that mono-themed web maps provide a better user experience, given the nature of health data and the costs behind spinning off another mapp (yup, blended words to make a portmanteau), sometimes you just have to combine themes.

Written by Harsh

September 21st, 2015 at 8:10 pm